# Battle of Neighborhoods: Socioeconomic Factors Versus Top Most Common Venue Types 


## Introduction

As part of the Chicago City Government, the city is interested in seeing how socioeconomic factors affect the prevalence of different venue types within clusters of similar socioeconomic features. That is so say, will different socioeconomic clusters have approximately the same top 5 venue types or will these vary wildly between clusters?

## Data

The data that will be used as part of this analysis will be Chicago City statistics of the poverty rate, hardship index, education, population density, and income of neighborhoods within Chicago, which will be used to construct the clusters. The foursquare API will be used to find the 5 most common venue categories within each cluster, which will be compared between clusters. 

### Data Import

In [88]:
import pandas as pd
import numpy as np
import geocoder
import requests
import folium

from sklearn import preprocessing
from sklearn.cluster import KMeans
from matplotlib import cm
from matplotlib import colors

In [3]:
# neighborhood data, specifically for neighborhood names and population
neighborhoods = pd.read_html('https://en.wikipedia.org/wiki/Community_areas_in_Chicago')[0]
neighborhoods.head()

Unnamed: 0,Number[8],Name[8],2017[9],Area (sq mi.)[10],Area (km2),2017density (/sq mi.),2017density (/km2)
0,1,Rogers Park,55062,1.84,4.77,29925.0,11554.11
1,2,West Ridge,76215,3.53,9.14,21590.65,8336.2
2,3,Uptown,57973,2.32,6.01,24988.36,9648.06
3,4,Lincoln Square,41715,2.56,6.63,16294.92,6291.5
4,5,North Center,35789,2.05,5.31,17458.05,6740.59


In [4]:
neighborhoods.drop(  # remove irrelevent data
    ['Number[8]', '2017[9]', 'Area (sq mi.)[10]', 'Area (km2)', '2017density (/km2)'],
    1,
    inplace=True)
neighborhoods.columns = ['Neighborhood', 'Density (/sq mi.)']  # rename for readability 
neighborhoods.head()

Unnamed: 0,Neighborhood,Density (/sq mi.)
0,Rogers Park,29925.0
1,West Ridge,21590.65
2,Uptown,24988.36
3,Lincoln Square,16294.92
4,North Center,17458.05


In [5]:
# economic data
economics = pd.read_csv('Per_Capita_Income.csv')
economics.head()

Unnamed: 0,Community Area Number,COMMUNITY AREA NAME,PERCENT OF HOUSING CROWDED,PERCENT HOUSEHOLDS BELOW POVERTY,PERCENT AGED 16+ UNEMPLOYED,PERCENT AGED 25+ WITHOUT HIGH SCHOOL DIPLOMA,PERCENT AGED UNDER 18 OR OVER 64,PER CAPITA INCOME,HARDSHIP INDEX
0,1.0,Rogers Park,7.7,23.6,8.7,18.2,27.5,23939,39.0
1,2.0,West Ridge,7.8,17.2,8.8,20.8,38.5,23040,46.0
2,3.0,Uptown,3.8,24.0,8.9,11.8,22.2,35787,20.0
3,4.0,Lincoln Square,3.4,10.9,8.2,13.4,25.5,37524,17.0
4,5.0,North Center,0.3,7.5,5.2,4.5,26.2,57123,6.0


In [6]:
economics.drop(['Community Area Number', 'PERCENT AGED UNDER 18 OR OVER 64'], 1, inplace=True)  # drop irrelevent column 
economics.columns = ['Neighborhood',
                     'Percent Housing Crowded',
                     'Percent Households Below Poverty',
                     'Percent Aged 16+ Unemployed',
                     'Percent Aged 25+ Without GED',
                     'Per Capita Income',
                     'Hardship Index'
                    ]
economics.head()

Unnamed: 0,Neighborhood,Percent Housing Crowded,Percent Households Below Poverty,Percent Aged 16+ Unemployed,Percent Aged 25+ Without GED,Per Capita Income,Hardship Index
0,Rogers Park,7.7,23.6,8.7,18.2,23939,39.0
1,West Ridge,7.8,17.2,8.8,20.8,23040,46.0
2,Uptown,3.8,24.0,8.9,11.8,35787,20.0
3,Lincoln Square,3.4,10.9,8.2,13.4,37524,17.0
4,North Center,0.3,7.5,5.2,4.5,57123,6.0


In [7]:
data = neighborhoods.merge(economics, on='Neighborhood', how='left')
data.head()

Unnamed: 0,Neighborhood,Density (/sq mi.),Percent Housing Crowded,Percent Households Below Poverty,Percent Aged 16+ Unemployed,Percent Aged 25+ Without GED,Per Capita Income,Hardship Index
0,Rogers Park,29925.0,7.7,23.6,8.7,18.2,23939.0,39.0
1,West Ridge,21590.65,7.8,17.2,8.8,20.8,23040.0,46.0
2,Uptown,24988.36,3.8,24.0,8.9,11.8,35787.0,20.0
3,Lincoln Square,16294.92,3.4,10.9,8.2,13.4,37524.0,17.0
4,North Center,17458.05,0.3,7.5,5.2,4.5,57123.0,6.0


In [8]:
data.dropna(inplace=True)  # drop NaN rows 

In [9]:
# Get lng and lat using geocoder

latitudes = []
longitudes = []
for neighborhood in data['Neighborhood']:
    location = None
    
    while location is None:
        geo = geocoder.arcgis('{}, Chicago, Illinois'.format(neighborhood))
        location = geo.json
        
    latitudes.append(location['lat'])
    longitudes.append(location['lng'])

In [10]:
data['Longitude'] = longitudes
data['Latitude'] = latitudes

data.head()

Unnamed: 0,Neighborhood,Density (/sq mi.),Percent Housing Crowded,Percent Households Below Poverty,Percent Aged 16+ Unemployed,Percent Aged 25+ Without GED,Per Capita Income,Hardship Index,Longitude,Latitude
0,Rogers Park,29925.0,7.7,23.6,8.7,18.2,23939.0,39.0,-87.66619,42.00897
1,West Ridge,21590.65,7.8,17.2,8.8,20.8,23040.0,46.0,-87.69266,41.99948
2,Uptown,24988.36,3.8,24.0,8.9,11.8,35787.0,20.0,-87.66936,41.96538
3,Lincoln Square,16294.92,3.4,10.9,8.2,13.4,37524.0,17.0,-87.68914,41.97583
4,North Center,17458.05,0.3,7.5,5.2,4.5,57123.0,6.0,-87.68142,41.95411


In [11]:
# Foursquare Venue lookup

CLIENT_ID = 'TDA1KZJ35ARZZOFLEDJELJZBDTHUVGHHAV0FLT1TBRONLJZ3' # your Foursquare ID
CLIENT_SECRET = '13Q5BQK3ZZS1QBWZQ4O2MX43ZBGI3M3T1STLER3SEFRLF1MC' # your Foursquare Secret
VERSION = '20180604'

In [12]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, limit=100):
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return relevant information for each venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [13]:
venues = getNearbyVenues(data['Neighborhood'], data['Latitude'], data['Longitude'])
venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Rogers Park,42.00897,-87.66619,Morse Fresh Market,42.008087,-87.667041,Grocery Store
1,Rogers Park,42.00897,-87.66619,Rogers Park Social,42.00736,-87.666265,Bar
2,Rogers Park,42.00897,-87.66619,The Common Cup,42.007797,-87.667901,Coffee Shop
3,Rogers Park,42.00897,-87.66619,Lifeline Theatre,42.007372,-87.666284,Theater
4,Rogers Park,42.00897,-87.66619,Rogers Park Provisions,42.007528,-87.666193,Gift Shop


## Methodology

The goal is to look at how socio-economics factors affect the types of venues that are present within a cluster of neighborhoods, clustered by said socio-economic factors.

The clusters will therefore be determined using the k-nearest-neighbors algorithm using the factors within the main data table called 'data'. Since there is no 'true' clusters, the training of the model is unsupervised and therefore there is no true cluster labels which could be used to evaluate the 'accuracy' of the cluster model. Ergo the entire dataset will be used to train the model and the value for K (for K nearest) will be set to 5, which should provide a reasonable grouping of neighborhoods based on socio-economic factors given the number of neighborhoods in total (73).

Before the neighborhoods can be clustered on the basis of the socio-economic features, these will have to be standardized so that each will have a zero mean and unit variance.

After the standardized data (with the exception of the neighborhood names, longitude and latitude) will be classified with the K-nearest-neighbor algorithm.

Once the clusters labels are determined for each neighborhood, onehot encoding will be performed on the menus by venue category, the results will be grouped by clustered and then sorted to determine the top 5 most common venue types within each cluster. 

#### Data Standardization

In [14]:
feature_data = data.drop(['Neighborhood', 'Longitude', 'Latitude'], 1)
features_norm = preprocessing.StandardScaler().fit(feature_data).transform(feature_data.astype(float))

features_norm[:5]

array([[ 2.37464751,  0.81506787,  0.15055194, -0.88724317, -0.18917909,
        -0.09236505, -0.37494222],
       [ 1.21463765,  0.84360501, -0.40068525, -0.87405198,  0.03246407,
        -0.15311328, -0.12899167],
       [ 1.68754521, -0.29788092,  0.18500426, -0.86086078, -0.73476226,
         0.70824127, -1.04252226],
       [ 0.47755568, -0.41202951, -0.94330936, -0.95319913, -0.59836647,
         0.82561578, -1.14792963],
       [ 0.63944501, -1.29668111, -1.23615412, -1.34893488, -1.35706806,
         2.14998136, -1.53442334]])

#### Calculating Clusters

In [15]:
K = 5
clusters = KMeans(n_clusters=K, random_state=11).fit(features_norm)

In [16]:
data.insert(1, 'Cluster', clusters.labels_)
data.head()

Unnamed: 0,Neighborhood,Cluster,Density (/sq mi.),Percent Housing Crowded,Percent Households Below Poverty,Percent Aged 16+ Unemployed,Percent Aged 25+ Without GED,Per Capita Income,Hardship Index,Longitude,Latitude
0,Rogers Park,0,29925.0,7.7,23.6,8.7,18.2,23939.0,39.0,-87.66619,42.00897
1,West Ridge,0,21590.65,7.8,17.2,8.8,20.8,23040.0,46.0,-87.69266,41.99948
2,Uptown,0,24988.36,3.8,24.0,8.9,11.8,35787.0,20.0,-87.66936,41.96538
3,Lincoln Square,2,16294.92,3.4,10.9,8.2,13.4,37524.0,17.0,-87.68914,41.97583
4,North Center,3,17458.05,0.3,7.5,5.2,4.5,57123.0,6.0,-87.68142,41.95411


In [17]:
neighborhood_clusters = data[['Neighborhood', 'Cluster']]
venue_cluster = venues.merge(neighborhood_clusters, on='Neighborhood', how='left')
venue_cluster.drop('Neighborhood', 1, inplace=True)

cols = venue_cluster.columns
cols = cols[cols != 'Cluster'].insert(0, 'Cluster')

venue_cluster = venue_cluster[cols]
venue_cluster.head()

Unnamed: 0,Cluster,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,0,42.00897,-87.66619,Morse Fresh Market,42.008087,-87.667041,Grocery Store
1,0,42.00897,-87.66619,Rogers Park Social,42.00736,-87.666265,Bar
2,0,42.00897,-87.66619,The Common Cup,42.007797,-87.667901,Coffee Shop
3,0,42.00897,-87.66619,Lifeline Theatre,42.007372,-87.666284,Theater
4,0,42.00897,-87.66619,Rogers Park Provisions,42.007528,-87.666193,Gift Shop


In [65]:
# onehot encoding
onehot = pd.get_dummies(venue_cluster['Venue Category'], prefix='', prefix_sep='')
onehot.insert(0, 'Cluster', venue_cluster['Cluster'])
onehot.head()

Unnamed: 0,Cluster,ATM,Accessories Store,Afghan Restaurant,African Restaurant,Airport,Airport Lounge,Airport Service,American Restaurant,Antique Shop,...,Video Store,Vietnamese Restaurant,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [66]:
# group by cluster

clusters_grouped = onehot.groupby('Cluster').mean().reset_index()
clusters_grouped.head()

Unnamed: 0,Cluster,ATM,Accessories Store,Afghan Restaurant,African Restaurant,Airport,Airport Lounge,Airport Service,American Restaurant,Antique Shop,...,Video Store,Vietnamese Restaurant,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio
0,0,0.0,0.005348,0.002674,0.0,0.0,0.0,0.0,0.013369,0.008021,...,0.008021,0.005348,0.0,0.0,0.002674,0.0,0.0,0.010695,0.002674,0.002674
1,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030888,0.0,...,0.019305,0.0,0.0,0.003861,0.0,0.0,0.003861,0.003861,0.0,0.0
2,2,0.002564,0.007692,0.0,0.002564,0.005128,0.012821,0.028205,0.015385,0.0,...,0.010256,0.0,0.002564,0.0,0.002564,0.005128,0.0,0.002564,0.002564,0.010256
3,3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.003846,...,0.007692,0.007692,0.0,0.0,0.0,0.0,0.0,0.003846,0.003846,0.007692
4,4,0.004219,0.0,0.0,0.0,0.0,0.0,0.0,0.025316,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016878,0.004219,0.0


In [69]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [70]:
return_most_common_venues(clusters_grouped.iloc[0,:], 5)

array(['Indian Restaurant', 'Mexican Restaurant', 'Bar', 'Coffee Shop',
       'Pizza Place'], dtype=object)

In [71]:
most_popular_5 = [
    return_most_common_venues(clusters_grouped.iloc[x,:], 5) for x in range(clusters_grouped.shape[0])
]
most_popular_5

[array(['Indian Restaurant', 'Mexican Restaurant', 'Bar', 'Coffee Shop',
        'Pizza Place'], dtype=object),
 array(['Mexican Restaurant', 'Pizza Place', 'Sandwich Place',
        'Coffee Shop', 'American Restaurant'], dtype=object),
 array(['Coffee Shop', 'Park', 'Bar', 'Sandwich Place', 'Pizza Place'],
       dtype=object),
 array(['Chinese Restaurant', 'Pizza Place', 'Bar', 'Coffee Shop',
        'Sandwich Place'], dtype=object),
 array(['Park', 'Fast Food Restaurant', 'Bus Station', 'Grocery Store',
        'Liquor Store'], dtype=object)]

In [86]:
economics_by_cluster = data.groupby('Cluster').mean().reset_index()
economics_by_cluster.drop(columns=['Longitude', 'Latitude'], axis=1, inplace=True)
economics_by_cluster['Cluster'] += 1  # Incrementing Clusters by 1 to be more readable (1 to 5 instead of 0 to 4)
economics_by_cluster.head()

Unnamed: 0,Cluster,Density (/sq mi.),Percent Housing Crowded,Percent Households Below Poverty,Percent Aged 16+ Unemployed,Percent Aged 25+ Without GED,Per Capita Income,Hardship Index
0,1,22435.25,5.88,17.79,9.93,19.68,26370.0,35.4
1,2,13864.83,10.123077,21.7,15.369231,39.615385,14731.153846,75.230769
2,3,9278.229524,2.466667,11.709524,10.771429,11.695238,32230.095238,24.333333
3,4,23359.91,1.08,11.58,5.38,4.12,67295.6,4.2
4,5,9284.2,4.416667,34.641667,23.9125,21.358333,15783.958333,73.416667


In [100]:
economics_by_cluster['#1 Popular'] = [x[0] for x in most_popular_5]
economics_by_cluster['#2 Popular'] = [x[1] for x in most_popular_5]
economics_by_cluster['#3 Popular'] = [x[2] for x in most_popular_5]
economics_by_cluster['#4 Popular'] = [x[3] for x in most_popular_5]
economics_by_cluster['#5 Popular'] = [x[4] for x in most_popular_5]
economics_by_cluster

Unnamed: 0,Cluster,Density (/sq mi.),Percent Housing Crowded,Percent Households Below Poverty,Percent Aged 16+ Unemployed,Percent Aged 25+ Without GED,Per Capita Income,Hardship Index,#1 Popular,#2 Popular,#3 Popular,#4 Popular,#5 Popular
0,1,22435.25,5.88,17.79,9.93,19.68,26370.0,35.4,Indian Restaurant,Mexican Restaurant,Bar,Coffee Shop,Pizza Place
1,2,13864.83,10.123077,21.7,15.369231,39.615385,14731.153846,75.230769,Mexican Restaurant,Pizza Place,Sandwich Place,Coffee Shop,American Restaurant
2,3,9278.229524,2.466667,11.709524,10.771429,11.695238,32230.095238,24.333333,Coffee Shop,Park,Bar,Sandwich Place,Pizza Place
3,4,23359.91,1.08,11.58,5.38,4.12,67295.6,4.2,Chinese Restaurant,Pizza Place,Bar,Coffee Shop,Sandwich Place
4,5,9284.2,4.416667,34.641667,23.9125,21.358333,15783.958333,73.416667,Park,Fast Food Restaurant,Bus Station,Grocery Store,Liquor Store


In [101]:
geo = geocoder.arcgis('Chicago, Illinios')

latitude = geo.json['lat']-.05  # minor adjustment to latitude to better center map on neighborhoods
longitude = geo.json['lng']

k_clusters = 5

# initalize the map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# generate colorscheme
x = np.arange(k_clusters)
ys = [i + x + (i*x)**2 for i in range(k_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(
    data['Latitude'],
    data['Longitude'],
    data['Neighborhood'],
    data['Cluster']):
    
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster+1), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)

In [105]:
economics_by_cluster.insert(1, 'Color', ['Red', 'Purple', 'Blue', 'Green', 'Orange'])

## Results

### Looking at the Clustering of Neighborhoods

In [113]:
map_clusters

Shown above is the mapping of each neighborhood within Chicago, colored by cluster, with the clusters based on the socioeconomic of each neighborhood, with the averages per cluster being shown below.

In [112]:
economics_by_cluster[economics_by_cluster.columns[:9]]

Unnamed: 0,Cluster,Color,Density (/sq mi.),Percent Housing Crowded,Percent Households Below Poverty,Percent Aged 16+ Unemployed,Percent Aged 25+ Without GED,Per Capita Income,Hardship Index
0,1,Red,22435.25,5.88,17.79,9.93,19.68,26370.0,35.4
1,2,Purple,13864.83,10.123077,21.7,15.369231,39.615385,14731.153846,75.230769
2,3,Blue,9278.229524,2.466667,11.709524,10.771429,11.695238,32230.095238,24.333333
3,4,Green,23359.91,1.08,11.58,5.38,4.12,67295.6,4.2
4,5,Orange,9284.2,4.416667,34.641667,23.9125,21.358333,15783.958333,73.416667


Some key take aways that cluster \#4 is the most wealthy cluster, having the highest Per Capita Income, lowest unemployment of people age 16+ and also lowest percentage of households below poverty, also having the lowest hardship index. The poorest cluster would be cluster \#5, which has the highest percentage of households below poverty and also the highest unemployment, however it does not have the lowest Per Capita Income or the highest hardship index, that would be cluster \#1, although the difference is relatively small.

### Most Common/Popular Venue Categories Within Each Cluster

In [124]:
economics_by_cluster[[x for i,x in enumerate(economics_by_cluster.columns) if i not in np.arange(2,7)]]

Unnamed: 0,Cluster,Color,Per Capita Income,Hardship Index,#1 Popular,#2 Popular,#3 Popular,#4 Popular,#5 Popular
0,1,Red,26370.0,35.4,Indian Restaurant,Mexican Restaurant,Bar,Coffee Shop,Pizza Place
1,2,Purple,14731.153846,75.230769,Mexican Restaurant,Pizza Place,Sandwich Place,Coffee Shop,American Restaurant
2,3,Blue,32230.095238,24.333333,Coffee Shop,Park,Bar,Sandwich Place,Pizza Place
3,4,Green,67295.6,4.2,Chinese Restaurant,Pizza Place,Bar,Coffee Shop,Sandwich Place
4,5,Orange,15783.958333,73.416667,Park,Fast Food Restaurant,Bus Station,Grocery Store,Liquor Store


Looking at the most popular, there is no commonality among the results, however for the 2nd most popular (popular in terms of commonality), 'Pizza Place' is the 2nd most popular in both the poorest and the wealthiest clusters. Additionally. 'Pizza Place' does appear as the 5th most popular for both cluster 1 and cluster 3, so it is fairly common among all clusters. It would therefore appear that pizza places are rather socioeconomically neutral, which is unsurprising as the type of food served by Pizza Places (primarily pizzas) is something that can come in a wide range of prices (depend on quality) and therefore it is unsurprising that it would be common amongst the clusters. The same appears to apply coffee places which appear in 4/5 clusters, 3/5 as 4th most popular while \#1 most popular for cluster 3.

However whats interesting is the clusters in which bars are among the 5 top, which all are 3rd most popular within clusters 1,2,4. This is particularly of note because this excludes the poorest clusters, neither cluster 2 or 5 have bars within their top 5 most popular venue types. This could potentially indicate that the lower socio-economic factors of those clusters prices them out of bars. More so, the 2nd to poorest cluster has a Liquor stores as as 5th most popular (as opposed to a bar) whilst the poorest has no venue that is centered around alcohol, indicating that they are like priced out. That being said, this is not to say there are no bars or no liquor stores within the poorest cluster, however it is likely that the poor socioeconomic status of the cluster has resulted in a decreased prominence of such venues. 

## Discussion

In looking at the effects of socioeconomic factors, the main consequence of socioeconomic factors on types of venues within the top 5 appears to be tied to alcohol centric venues. Specifically bars were absent from the poorest 2 clusters, and the poorest cluster had no alcohol centric (bar, liquor store) venue whereas the second poorest had a liquor store. An indication here is that increased wealth of a cluster of neighborhoods results in a greater prominence of bars within the area. 

On the other hand pizza place and coffee shop venues appear to be independent of socioeconomic status of clusters indicating that these venues are fairly universal among socioeconomic status of neighborhoods.

Another interesting find is that of all venues within the top 5, the only non food related venues were parks and bus stations. Bus stations only appear in the top 5 of 1 cluster, this cluster being the 2nd poorest cluster. Since this is the only public transportation type venue, it would indicate that public transport is particular important to this poorer neighborhood. The reason why this appear in cluster 5 rather than cluster 2 (poorest based on per capita income) is perhaps due to cluster 5 having the highest percentage of households below poverty (34.6%), which could contribute the importance of public transportation given that more households are below poverty and therefore less are able to afford other types of transportation. 

## Conclusion

In conclusion, the affect of socioeconomic status on the prevalence of specific types venues is dependent on the type of venue itself. Venues like Pizza Places and Coffee Shops appear universally among clusters, however venues like Bars, appear only in the more affluent clusters being absent in the top 5 of the poorest two clusters. The number of households below poverty would also indicate the importance of public transportation to a cluster, in which the cluster with the highest percentage of households below poverty was the only cluster which had a venue related to public transportation, Bus Stations, within the top 5 venues. 