<h1>The Battle of Neighborhoods: Five Tech Cities</h1>

<h2><u>Business Problem</u></h2>

<h4>According to <a href="https://www.washingtonpost.com/business/2019/12/09/explosion-us-tech-jobs-concentrated-just-five-metro-areas-study-finds/"> The Washington Post </a>   we’re currently living through an unprecedented era of growth in tech jobs. The explosion of U.S. tech jobs is concentrated in five metro areas. These five cities are Seattle, Boston, San Francisco, San Jose, and San Diego. Many startups are increasingly calling these cities home and many companies are increasingly shifting to a tech-focused mindset to compete within their industries. This brings about an increase in jobs where a tech background is needed, making skills such as data science, coding, and agile project management more important than ever.</h4>

<h4>Only five metro areas – Boston, San Diego, San Francisco, San Jose, and Seattle – attained up to 90% of the 256,063 tech jobs created from 2005 to 2017, according to the Brookings Institution and the Information Technology and Innovation Foundation. The remaining 10% was divvied up in 377 other areas. The share of these jobs dropped significantly in would-be hubs such as Chicago, Durham, North Carolina, Philadelphia, Dallas, and Wichita.</h4>

<h4>In this scenario, it is important to adopt machine learning tools to assist our Tech workforce (herein referred to as “techies”) interested in living and working in five popular and populated US cities: Philadelphia, PA, New York, NY, Chicago, IL, San Francisco, CA, and Boston, MA to make wise and effective decisions in choosing the city with a high density of tech startups to apply for jobs. As a result, the business problem we are currently posing is how could we provide support to techies to apply to tech startups in Philadelphia, New York, Chicago, San Francisco, and Boston?
To solve this business problem, we will use the FourSquare API to collect data of each tech startup venue in the five US cities we chose for this analysis, Philadelphia, PA, New York, NY, Chicago, IL, San Francisco, CA, and Boston, MA. We look forward to guiding our techies to new areas of opportunities, literally not figuratively and may the best city win.</h4>


<h2><u>Data Section</u></h2>

<h4>The main goal is to analyze the city with the highest density of tech startups by using the FourSquare API through the venues channel. Using the near query will provide a significant number of venues in each city. The Category ID 4bf58dd8d48988d125941735 will only show tech startups with a limit of 100 venues per query. This request was made for each city, Philadelphia, PA, New York, NY, Chicago, IL, San Francisco, CA, and Boston, MA, and from the result the name and coordinates data was plotted for cluster map visualization. The center coordinates of all venues were calculated to attain the mean longitude and latitude. Then, the Euclidean distance from each venue to the mean coordinates was calculated as a point of reference for each potential area that would occupy most tech startups all around.</h4>

<h2><u>Methodology Section</u></h2>

<h4>Exploring and recommending different cities based on the density of where these tech startups are located in Philadelphia, PA, New York, NY, Chicago, IL, San Francisco, CA, and Boston, MA will be retrieved by accessing data through FourSquare API interface and arranged for visualization. Then, we will be able to recommend a final city for our techies to apply for jobs and potentially go to different interviews the same day or week.</h4>
<h4>The Methodology section comprises four stages:</h4>
<ol><li><strong>Collect Data</strong></li>
    <li><strong>Explore and Understand data visualization. Enjoy clicking on each circle to see the name of each tech startup</strong></li>
    <li><strong>Data Preparation before Mean distance </strong></li>
    <li><strong>Processing Data after Mean distance</strong></li></ol>

In [17]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
import numpy as np
from sklearn.cluster import KMeans
from geopy.geocoders import Nominatim 
import matplotlib.cm as cm
import matplotlib.colors as colors
!pip install folium # map rendering library
import pandas as pd
import folium
print('Libraries imported.')

Libraries imported.


In [18]:
CLIENT_ID = 'PRGL0DCJGJ5XIV2JWJP3RSMPKYE40CGZK5AG3O2HIMRX3PSE' # your Foursquare ID
CLIENT_SECRET = 'PNGOS22FMJ3H2QNVLF5AC4NE0MKK5KRZQ1XKTDYQPWYQYDZB' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)

Your credentails:
CLIENT_ID: PRGL0DCJGJ5XIV2JWJP3RSMPKYE40CGZK5AG3O2HIMRX3PSE


In [19]:
LIMIT = 500 # Maximum is 100
cities = ['Philadelphia, PA','New York, NY', 'Chicago, IL', 'San Francisco, CA', 'Boston, MA']
results = {}
for city in cities:
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&near={}&limit={}&categoryId={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        city,
        LIMIT,
        "4bf58dd8d48988d125941735") # TECH STARTUP CATEGORY ID
    results[city] = requests.get(url).json()

In [20]:
df_venues={}
for city in cities:
    venues = json_normalize(results[city]['response']['groups'][0]['items'])
    df_venues[city] = venues[['venue.name', 'venue.location.address', 'venue.location.lat', 'venue.location.lng']]
    df_venues[city].columns = ['Name', 'Address', 'Lat', 'Lng']

In [21]:
maps = {}
for city in cities:
    city_lat = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lat'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lat']])
    city_lng = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lng'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lng']])
    maps[city] = folium.Map(location=[city_lat, city_lng], zoom_start=11)

    # add markers to map
    for lat, lng, label in zip(df_venues[city]['Lat'], df_venues[city]['Lng'], df_venues[city]['Name']):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(maps[city])
    print("--------Collect Data--------")
    print(f"Total number of tech startups in {city} = ", results[city]['response']['totalResults'])
    print("Showing Top 100")

--------Collect Data--------
Total number of tech startups in Philadelphia, PA =  107
Showing Top 100
--------Collect Data--------
Total number of tech startups in New York, NY =  180
Showing Top 100
--------Collect Data--------
Total number of tech startups in Chicago, IL =  141
Showing Top 100
--------Collect Data--------
Total number of tech startups in San Francisco, CA =  195
Showing Top 100
--------Collect Data--------
Total number of tech startups in Boston, MA =  130
Showing Top 100


<h2><u>Philadelphia<u></h2>

In [22]:
maps[cities[0]]

<h2><u>New York, NY<u></h2>

In [23]:
maps[cities[1]]

<h2><u>Chicago, IL<u></h2>

In [24]:
maps[cities[2]]

<h2><u>San Francisco, CA<u></h2>

In [25]:
maps[cities[3]]

<h2><u>Boston, MA<u></h2>

In [26]:
maps[cities[4]]

<h2><center>We can see that Boston, MA, New York, NY, and Philadelphia, PA are dense cities with tech startups. There is a way we can do better and have a concrete measure of this density by analyzing the size using a statistical approach. Attaining the mean location of all the tech startups which will be in high density when near and low density when far. Tracing the average distance of each venue to the mean coordinates would greatly determine the outliers and the best area to select when considering living and working in a location with the highest volume of tech startups.<br>
    <br>Mean distance from mean coordinates grounds itself in euclidean distance (represents the shortest distance between two points) though in this case for accuracy, intent, and purposes it is the mean which will be utilized. Potential processing of this data can lend itself to interpretations based on Pattern classification, k-Nearest neighbor, and Distance function which are outside of the scope of this analysis. As you will see below, we represent the mean coordinates with a large yellow circle and distances with black lines.</center></h2>

<h2><u>Analysis</u></h2>

In [27]:
maps = {}
for city in cities:
    city_lat = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lat'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lat']])
    city_lng = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lng'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lng']])
    maps[city] = folium.Map(location=[city_lat, city_lng], zoom_start=11)
    venues_mean_coor = [df_venues[city]['Lat'].mean(), df_venues[city]['Lng'].mean()] 
    # add markers to map
    for lat, lng, label in zip(df_venues[city]['Lat'], df_venues[city]['Lng'], df_venues[city]['Name']):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(maps[city])
        folium.PolyLine([venues_mean_coor, [lat, lng]], color="black", weight=1.5, opacity=0.5).add_to(maps[city])
    
    label = folium.Popup("Mean Co-ordinate", parse_html=True)
    folium.CircleMarker(
        venues_mean_coor,
        radius=10,
        popup=label,
        color='yellow',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(maps[city])

    print(city)
    print("Mean Distance from Mean coordinates(MD)")
    print(np.mean(np.apply_along_axis(lambda x: np.linalg.norm(x - venues_mean_coor),1,df_venues[city][['Lat','Lng']].values)))

Philadelphia, PA
Mean Distance from Mean coordinates(MD)
0.03704758874316186
New York, NY
Mean Distance from Mean coordinates(MD)
0.01588455816583397
Chicago, IL
Mean Distance from Mean coordinates(MD)
0.029379499925633813
San Francisco, CA
Mean Distance from Mean coordinates(MD)
0.013971154430047825
Boston, MA
Mean Distance from Mean coordinates(MD)
0.01866261368241032


<h2><u>Philadelphia<u></h2>

In [28]:
maps[cities[0]]

<h2><u>New York, NY<u></h2>

In [29]:
maps[cities[1]]

<h2><u>Chicago, IL<u></h2>

In [30]:
maps[cities[2]]

<h2><u>San Francisco, CA<u></h2>

In [31]:
maps[cities[3]]

<h2><u>Boston<u></h2>

In [32]:
maps[cities[4]]

<h2><center>We can see that San Francisco, CA is the best city for any techie! It's amazing how by proximity it provides the most tech startups in a single location which will help all techies to apply to jobs around the area. The maps of the other four remaining cities help pinpoint the exact location where techies can find the most opportunities if they are still interested in applying to jobs in those cities. You can also see clearly see the outliers in each map which are too far from the center of the mean.</center></h2>

<h1><u>Results and Discussion section<u></h1>

<h4>Tech startups have grown exponentially and from the five cities we analyzed San Francisco, CA still holds it's grounds as one of the most sought out locations for tech startups and for potential techies who are looking for the most opportunities all around in a single location it's definitely worth your time and efforts to send job applications and set up interviews all around until you get hired!</h4> 

<h4>Five cities were analyzed. Philadelphia, PA, New York, NY, Chicago, IL, San Francisco, CA, and Boston, MA, which were all within our scope to determine which of these will have the tightest cluster of tech startups and increase your hiring possibilities. The data provided also gives you pinpoint accuracy of each tech startup which you can research and make a decision if that company is one which aligns with your values and goals. If you were to look into the map in San Francisco, CA you can see that <a href="https://foursquare.com/v/pivotal-hq/49b83816f964a52038531fe3">Pivotal HQ</a>, <a href="https://foursquare.com/v/playhaven-south-hq/4cf98c5f0df3236a3559eea9">PlayHaven South HQ</a>, and <a href="https://bridgefy.me/">Bridgefy SF HQ</a> are relatively very close to each other and if you were to live around that location you would be strategically around many other potential companies.</h4>

<h4>Let's look into Philadelphia, PA where you can find a small cluster of tech startups like <a href="https://foursquare.com/v/appadooo/52da26d311d24984005cb2b4">Appadooo</a>, <a href="https://foursquare.com/v/newquest-usa/5165833519a967ebb102aebd">NewQuest USA</a>, <a href="https://foursquare.com/v/ttr-data-recovery-services--philadelphia/57cf812b498e0cf72a413af3">TTR Data Recovery Services - Philadelphia</a> which are relatively clustered together and if it's a city you were considering to live and work then you can definitely explore your opportunities. You can take this same concept and look into other clusters in other cities and find your own opportunities!</h4>


<h1><u>Conclusion<u></h1>

<h2><center>The increase in tech jobs and tech startups shows us how entrepreneurial our society has become. Continuous improvements and innovations in technology will help gear us towards more connectivity and easier access to information with which we will become a better society. Change takes time. It is certain that more and more tech startups will emerge as time goes on. I look forward towards your own self growth and to use this data to look into the different tech start ups in the five cities which were analyzed (Philadelphia, PA, New York, NY, Chicago, IL, San Francisco, CA, and Boston, MA,) and motivate you to research a few companies and take a look into their different products and services!</center></h2>