<h1> Best Place to Survive a Zombie Apocalypse </h1>

<h3> By Eduardo Beltrán Herrera </h3>
<h4> Oct. 22, 2020 </h4>

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Conclusion](#results)

## 1. Business Problem <a name="introduction"></a>


<div style="background-color:whitesmoke;">
<p style="font-size:120%;">
    If 2020 has taught us anything is that <b>things can always get worse</b>. The global pandemic has demonstrated that even as advanced as our civilizations are, we are still not ready to face the worst possible scenarios that may be put in our way. That's why preparation is key, and it's up to us to make the right decisions, with the right data, when the time comes to confront the problems that plague humankind.
</p>
<p style="font-size:120%;">
    As unlikley as it might be, a <b>zombie apocalypse</b> is one of these scenarios we should prepare for. This project will try to solve one of the first problems one should face in such a situation: <b>where should I bunker myself when the zombie breakout has begun?</b>.
</p>
<p style="font-size:120%;">
    We'll analyse the neighborhoods of New York City (where, as we all know, all world-changing catastrophies begin) to find the one that better accomodates for the following basic needs:
    <ol style="font-size:120%;">
      <li>Food and water (a.k.a. Grocery Stores)</li>
      <li>Medicine & First Aid resources (Pharmacies)</li>
      <li>Construction supplies for bunker-building (Home Depot, Lowe's...)</li>
    </ol>
</p>

<p style="font-size:120%;">
    Other venue categories might be considered as they appear on the data. Neighborhoods will be clustered using the <b>k-means</b> algorithm and later described so that we can determine teh best cluster.
</p>
<br>
</div>

## 2. Data <a name="data"></a>

<div style="background-color:whitesmoke;">

<p style="font-size:120%;">
    The <b>Foursquare API</b> will be used to obtain the data relating to the venues in each neighborhood. The Foursquare database allows us to gather information of venues close to a particular location, providing additional data like the catrgory of the venue (this will tell us if the venue is a grocery store, coffee shop, etc.). Foursquare also allows us to see the ratings given to each venue, but since we are planning for a survival situation, the quality of the place does not concern us. The <b>location and category of the venue</b> should be enough for us to determine how good the neighborhood is to meet our needs during the apocalypse.
</p> 
    
<br>    
</div>

## 3. Methodology <a name="methodology"></a>

<div style="background-color:whitesmoke;">

<p style="font-size:120%;">
A listing of New York neighborhoods is imported the same way it was done in Week 3. Foursquare query's are excecuted for each neighborhood to find all venues in a determined radius.
</p>   
    
</div>

In [3]:
#Libraries

import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

import folium # plotting library
from geopy.geocoders import Nominatim

from sklearn.cluster import KMeans

In [4]:
# Listing of New York neighborhoods
neighborhoods = pd.read_csv("NY_neighborhoods.csv")
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [5]:
#Geocoder for NYC
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

In [6]:
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

In [7]:
# Foursquare credentials
CLIENT_ID = 'X2JQ4A0S4YY4ER0SISNAFQZXETZFO45XZWM3VFHQW5K3P0AD'
CLIENT_SECRET = 'GCBUZNOQ1QK2RPWA0W0TEJCIQMVAKM1AT05NGHDISZDCIMP3'
VERSION = '20180605' 
LIMIT = 60

In [8]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    # For each neighborhood in NYC, find the closest venues by category
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [9]:
ny_venues = getNearbyVenues(names=neighborhoods['Neighborhood'],
                                   latitudes=neighborhoods['Latitude'],
                                   longitudes=neighborhoods['Longitude']
                                  )
print('Venue extraction complete.')

Venue extraction complete.


In [10]:
ny_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Wakefield,40.894705,-73.847201,Lollipops Gelato,40.894123,-73.845892,Dessert Shop
1,Wakefield,40.894705,-73.847201,Rite Aid,40.896649,-73.844846,Pharmacy
2,Wakefield,40.894705,-73.847201,Carvel Ice Cream,40.890487,-73.848568,Ice Cream Shop
3,Wakefield,40.894705,-73.847201,Walgreens,40.896528,-73.8447,Pharmacy
4,Wakefield,40.894705,-73.847201,Dunkin',40.890459,-73.849089,Donut Shop


In [11]:
#Check all the different venue categories

ny_venues['Venue Category'].unique()

array(['Dessert Shop', 'Pharmacy', 'Ice Cream Shop', 'Donut Shop',
       'Gas Station', 'Sandwich Place', 'Deli / Bodega', 'Pizza Place',
       'Laundromat', 'Discount Store', 'Post Office', 'Bagel Shop',
       'Grocery Store', 'Fast Food Restaurant', 'Restaurant',
       'Bus Station', 'Baseball Field', 'Chinese Restaurant', 'Trail',
       'Park', 'Bar', 'Accessories Store', 'Caribbean Restaurant',
       'Diner', 'Seafood Restaurant', 'Bowling Alley', 'Automotive Shop',
       'Food & Drink Shop', 'Metro Station', 'Convenience Store',
       'Juice Bar', 'Bus Stop', 'Cosmetics Shop', 'Plaza', 'River',
       'Medical Supply Store', 'Bank', 'Moving Target', 'Food Truck',
       'Home Service', 'Gym', 'Playground', 'Gourmet Shop',
       'Latin American Restaurant', 'Burger Joint', 'Pub', 'Beer Bar',
       'Warehouse Store', 'Mexican Restaurant', 'Coffee Shop',
       'Spanish Restaurant', 'Wings Joint', 'Thrift / Vintage Store',
       'Supermarket', 'Bakery', 'Candy Store', 'Ren

<div style="background-color:whitesmoke;">

<p style="font-size:120%;">
    From a quick analysis of all venue categories, the ones that will be considered relevant for zombie apocalypse-survival will be:
        <ul style="font-size:120%;">
      <li>Pharmacy</li>
      <li>Convenience Store</li>
      <li>Grocery Store</li>
      <li>Supermarket</li>
      <li>Furniture / Home Store</li>
    </ul>
    
</p> 
   
</div>

In [12]:
#Select only relevant venues
venues = ['Pharmacy', 'Convenience Store', 'Grocery Store', 'Supermarket', 'Furniture / Home Store']

ny_venues_rel = ny_venues[ny_venues['Venue Category'].isin(venues)]
ny_venues_rel.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
1,Wakefield,40.894705,-73.847201,Rite Aid,40.896649,-73.844846,Pharmacy
3,Wakefield,40.894705,-73.847201,Walgreens,40.896528,-73.8447,Pharmacy
10,Co-op City,40.874294,-73.829939,Rite Aid,40.870345,-73.828302,Pharmacy
15,Co-op City,40.874294,-73.829939,Food Universe Marketplace,40.87674,-73.82898,Grocery Store
44,Eastchester,40.887556,-73.827806,Adil Newsstand & Grocery,40.888433,-73.831277,Convenience Store


In [13]:
print('Number of venues: ', len(ny_venues_rel.index))
ny_venues_rel['Venue Category'].unique()

Number of venues:  508


array(['Pharmacy', 'Grocery Store', 'Convenience Store', 'Supermarket',
       'Furniture / Home Store'], dtype=object)

In [14]:
#Group by neighborhood - venue category

venue_count = pd.DataFrame(ny_venues_rel.groupby(['Neighborhood', 'Venue Category'])['Venue Category'].agg(['count'])).reset_index()
venue_count.head()

Unnamed: 0,Neighborhood,Venue Category,count
0,Allerton,Grocery Store,1
1,Allerton,Pharmacy,1
2,Allerton,Supermarket,2
3,Arden Heights,Pharmacy,1
4,Arlington,Grocery Store,1


In [15]:
venue_count['Venue Category'].value_counts()

Grocery Store             126
Pharmacy                  111
Supermarket                78
Convenience Store          41
Furniture / Home Store     33
Name: Venue Category, dtype: int64

<div style="background-color:whitesmoke;">

<p style="font-size:120%;">
    We can see that pharmacies and supermarkets are abundant in NYC, but Home Stores are lacking. Neighborhoods that have home stores will lickely be clustered together.
    
</p> 
   
</div>

In [16]:
venue_count = pd.pivot_table(venue_count, values='count', index=['Neighborhood'],
                    columns=['Venue Category'], aggfunc=np.sum, fill_value=0).reset_index()

venue_count

Venue Category,Neighborhood,Convenience Store,Furniture / Home Store,Grocery Store,Pharmacy,Supermarket
0,Allerton,0,0,1,1,2
1,Arden Heights,0,0,0,1,0
2,Arlington,0,0,1,0,0
3,Arrochar,0,0,0,1,1
4,Astoria,0,0,1,0,0
...,...,...,...,...,...,...
210,Wingate,0,0,1,1,0
211,Woodhaven,0,0,0,2,1
212,Woodlawn,0,0,1,0,0
213,Woodrow,0,0,1,2,0


In [18]:
venue_count.dtypes

Venue Category
Neighborhood              object
Convenience Store          int64
Furniture / Home Store     int64
Grocery Store              int64
Pharmacy                   int64
Supermarket                int64
dtype: object

In [19]:
# Clustering

# set number of clusters
kclusters = 5

venue_clustering = venue_count.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(venue_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 0, 1, 0, 1, 3, 0, 0, 1, 4])

In [20]:
venue_count.insert(0, 'Cluster Labels', kmeans.labels_)
venue_count.head()

Venue Category,Cluster Labels,Neighborhood,Convenience Store,Furniture / Home Store,Grocery Store,Pharmacy,Supermarket
0,2,Allerton,0,0,1,1,2
1,0,Arden Heights,0,0,0,1,0
2,1,Arlington,0,0,1,0,0
3,0,Arrochar,0,0,0,1,1
4,1,Astoria,0,0,1,0,0


## 4. Analysis <a name="analysis"></a>

In [21]:
venue_count['Cluster Labels'].value_counts()

1    77
0    64
3    39
4    23
2    12
Name: Cluster Labels, dtype: int64

<div style="background-color:whitesmoke;">

### Cluster 0    
    
<p style="font-size:120%;">
    Cluster 0 contains neighborhoods with a couple pharmacies, with some supermarkets appearing now and then.
</p> 
   
</div>

In [22]:
venue_count[venue_count['Cluster Labels'] == 0]

Venue Category,Cluster Labels,Neighborhood,Convenience Store,Furniture / Home Store,Grocery Store,Pharmacy,Supermarket
1,0,Arden Heights,0,0,0,1,0
3,0,Arrochar,0,0,0,1,1
6,0,Auburndale,0,0,0,1,1
7,0,Bath Beach,0,0,0,2,0
12,0,Bayside,0,0,0,1,0
...,...,...,...,...,...,...,...
200,0,Wakefield,0,0,0,2,0
203,0,West Brighton,0,0,0,1,1
206,0,Westchester Square,0,0,0,2,1
211,0,Woodhaven,0,0,0,2,1


<div style="background-color:whitesmoke;">

### Cluster 1    
    
<p style="font-size:120%;">
    Cluster 1 neighborhoods contain one or two grocery stores, and not much else.
</p> 
   
</div>

In [23]:
venue_count[venue_count['Cluster Labels'] == 1]

Venue Category,Cluster Labels,Neighborhood,Convenience Store,Furniture / Home Store,Grocery Store,Pharmacy,Supermarket
2,1,Arlington,0,0,1,0,0
4,1,Astoria,0,0,1,0,0
8,1,Battery Park City,1,0,1,0,0
15,1,Bellaire,1,0,1,0,0
17,1,Belmont,0,0,1,0,0
...,...,...,...,...,...,...,...
202,1,Weeksville,0,1,1,0,0
208,1,Williamsburg,0,0,1,0,0
209,1,Windsor Terrace,0,0,2,0,0
210,1,Wingate,0,0,1,1,0


<div style="background-color:whitesmoke;">

### Cluster 2    
    
<p style="font-size:120%;">
    Cluster 2 seems to contain neighborhoods with between 1 and 3 supermarkets, with a fair number of pharmacies and grocery stores as well.
</p> 
   
</div>

In [24]:
venue_count[venue_count['Cluster Labels'] == 2]

Venue Category,Cluster Labels,Neighborhood,Convenience Store,Furniture / Home Store,Grocery Store,Pharmacy,Supermarket
0,2,Allerton,0,0,1,1,2
10,2,Bay Terrace,0,1,0,1,3
13,2,Bedford Park,1,0,1,2,2
27,2,Bulls Head,1,0,2,2,1
51,2,Concourse Village,2,0,0,2,2
72,2,Far Rockaway,0,0,2,1,2
112,2,Kingsbridge,0,1,1,2,2
128,2,Melrose,0,0,2,3,1
139,2,Mount Eden,0,0,2,3,2
149,2,Ocean Hill,1,0,2,1,2


<div style="background-color:whitesmoke;">

### Cluster 3   
    
<p style="font-size:120%;">
    In cluster see we have neighborhoods with very spread-out venues, with one or two at most.
</p> 
   
</div>

In [25]:
venue_count[venue_count['Cluster Labels'] == 3]

Venue Category,Cluster Labels,Neighborhood,Convenience Store,Furniture / Home Store,Grocery Store,Pharmacy,Supermarket
5,3,Astoria Heights,0,0,0,0,1
11,3,Baychester,1,0,0,0,1
14,3,Beechhurst,1,0,0,0,1
21,3,Briarwood,1,0,0,0,0
24,3,Bronxdale,1,0,0,0,0
26,3,Brownsville,1,0,0,0,0
37,3,Chinatown,0,1,0,0,1
40,3,Civic Center,0,1,0,0,0
44,3,Clinton,0,0,0,0,1
45,3,Clinton Hill,1,0,0,0,0


<div style="background-color:whitesmoke;">

### Cluster 4   
    
<p style="font-size:120%;">
    Cluster 4 contains neighborhoods with the most grocery stores, with some pharmacies and supermarkets.
</p> 
   
</div>

In [26]:
venue_count[venue_count['Cluster Labels'] == 4]

Venue Category,Cluster Labels,Neighborhood,Convenience Store,Furniture / Home Store,Grocery Store,Pharmacy,Supermarket
9,4,Bay Ridge,0,0,2,2,0
19,4,Boerum Hill,0,3,2,0,0
39,4,City Line,0,0,2,1,0
41,4,Claremont Village,0,0,3,0,0
50,4,Concourse,0,0,4,1,0
71,4,Erasmus,1,1,2,1,1
81,4,Forest Hills Gardens,0,0,2,1,0
102,4,Homecrest,0,0,3,1,0
110,4,Kensington,0,1,3,1,1
113,4,Kingsbridge Heights,0,0,3,1,0


<div style="background-color:whitesmoke;">
    
<p style="font-size:120%;">
    Because of the high volume of supermarkets, grocery stores and pharmacies, we conclude that <b>Cluster 2</b> has the best neighborhoods for our survival needs. Particularly, <b>Kingsbridge</b> contains a good distribution of food resources, medicine, and a very sought-after Home Store for all our bunkering needs.
    
</p> 
   
</div>

In [29]:
neighs_c2 = venue_count[venue_count['Cluster Labels'] == 2]
neighs_c2[neighs_c2['Neighborhood'] == "Kingsbridge"]

Venue Category,Cluster Labels,Neighborhood,Convenience Store,Furniture / Home Store,Grocery Store,Pharmacy,Supermarket
112,2,Kingsbridge,0,1,1,2,2


## 5. Results and Conclusion <a name="results"></a>

<div style="background-color:whitesmoke;">
    
<p style="font-size:120%;">
    We have concluded that <b>Kingsbridge</b> is probably the safest neighborhood to go to in order to survive the zombie apocalypse. However, a more complex analysis is recommended to include other factors that will impact our survival odds, such as pupulation density, and transportation methods.
<br>
    Overall, our analysis will be a handy guide for keen survivalists to further expand their plans, in case the worst is yet to come...
    
</p> 
   
</div>