#Capstone Project: Best College Towns

##Introduction

As a web-based service that assists students in selecting schools, we accept requests from prospective high school students to compare colleges and universities along a variety of factors.  As high school students prepare to apply to colleges, they want to consider their college experience.  What type of college town are their target schools based in? They can't afford to visit all.  Therefore, we will leverage the Foursquare API and develop a K-means clustering to get some sense of what types of environments exist and how similar different schools are.  

We receive a request from a students with the schools they are interested in.  The ten schools the student is considering are: Northwestern University (Evanston, Illinois); University of Michigan (Ann Arbor, Michigan); Stanford (Palo Alto, California); Auburn (Auburn, Alabama); Florida State (Talllahasse, Florida); Wellesey College (Wellesey, Massachusetts); Tufts (Medford, Massachusetts); Columbia University (New York City, New York); Duke University (Durham, North Carolina); Washington University (St. Louis, Missouris).

##Data

We will use the Foursquare API and the school address to pull the top venues around with a 1 kilometer radius of the school, easily within walking distance.  The Foursquare data is regularly updated, high quality data set for understanding the types of venues that exist whether it be coffee shops, bars, or parks.  To use this data, we will take the school addresses and use the geopy.goecoders package to convert them into latitudes and longitudes for each school.  We will then retrieve the nearby venues from the API.  

###Example

In [1]:
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

print('Libraries imported.')

Fetching package metadata .............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
geopy                     1.19.0                     py_0    conda-forge
Libraries imported.


In [2]:
address = '633 Clark St, Evanston, IL 60208'

geolocator = Nominatim(user_agent="college_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Northwestern University are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Northwestern University are 42.0505042, -87.679816698913.


In [4]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 1000 # define radius

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)

results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5c9fbdcddd57972612a9a88b'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-54f7e753498ee23deca98e2a-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/pizza_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d1ca941735',
         'name': 'Pizza Place',
         'pluralName': 'Pizza Places',
         'primary': True,
         'shortName': 'Pizza'}],
       'id': '54f7e753498ee23deca98e2a',
       'location': {'address': '1737 Sherman Ave,',
        'cc': 'US',
        'city': 'Evanston',
        'country': 'United States',
        'distance': 176,
        'formattedAddress': ['1737 Sherman Ave,',
         'Evanston, IL 60201',
         'United States'],
        'labeledLatLngs': [{'label': 'display',
          'lat': 42.0

In [5]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [6]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Blaze Pizza,Pizza Place,42.049606,-87.681572
1,Evanston Public Library,Library,42.04826,-87.679929
2,Andy's Frozen Custard,Ice Cream Shop,42.04847,-87.681512
3,Evanston Athletic Club,Gym,42.049375,-87.683101
4,Lou Malnati's Pizzeria,Pizza Place,42.051383,-87.681848


In [7]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

96 venues were returned by Foursquare.


Now we can acount the how many of each category were returned.  Coffee shops are the most popular venue in the area.

In [8]:
nearby_venues.groupby('categories').count()

Unnamed: 0_level_0,name,lat,lng
categories,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
American Restaurant,4,4,4
Art Museum,2,2,2
Arts & Crafts Store,1,1,1
Asian Restaurant,3,3,3
BBQ Joint,1,1,1
Bagel Shop,1,1,1
Bakery,2,2,2
Bar,3,3,3
Beach,2,2,2
Bookstore,2,2,2


From the data, we hope to be able to give the student a profile of the different types of venues that are most present in the vicity of the school, how urban or rural the school is based on the number of venues in the area, as well as what types of schools are similar and different.