# Introduction/Business Problem

## Does a universities local venues play a roll in its students' success?
With the abundance of universities in the US, each university varies with size, audience, and success rates of their students. Some colleges are small and cater to their local students attending in-state. Other universities are massive with a  large variety of students from all around the world. What drives students to want to attend these massive universities? Some may argue this to be the student success rates, while others just look for programs catered to what they would be interested in studying and eventually pursuing in the workforce. However, what some may not consider is how a student actually feels living in the area of that university depending on its available venues nearby. For example, a student considering an out-of-state college experience may wish to attend a university that has a variety of local venues for the student to visit and explore during their experience, while an in-state student (or one that lives locally) may not mind as they already live there. Could the variety in nearby venues drive a university to have more successful students, considering the majority of their students are attending on out-of-state tuition? We will explore this question further.

# Data


We have retrieved a dataset from <a href="https://www.kaggle.com/sumithbhongale/american-university-data-ipeds-dataset">Kaggle.com</a> giving the locations of different universities, with various statistics from the year 2013. Particularly, we will be looking at universities with a majority of out-of-state students and how the university performs using graduation statistics, as well as pre-college test scores. We will be comparing this to the variety of nearby venues, as well as what nearby venues correlate with a high success rate in the classroom.

In [10]:
import pandas as pd
import numpy as np

df = pd.read_csv('UniversityData.csv')

df.head()

Unnamed: 0,ID number,Name,year,ZIP code,Highest degree offered,County name,Longitude location of institution,Latitude location of institution,Religious affiliation,Offers Less than one year certificate,...,Percent of freshmen receiving federal grant aid,Percent of freshmen receiving Pell grants,Percent of freshmen receiving other federal grant aid,Percent of freshmen receiving state/local grant aid,Percent of freshmen receiving institutional grant aid,Percent of freshmen receiving student loan aid,Percent of freshmen receiving federal student loans,Percent of freshmen receiving other loan aid,Endowment assets (year end) per FTE enrollment (GASB),Endowment assets (year end) per FTE enrollment (FASB)
0,100654,Alabama A & M University,2013,35762,Doctor's degree - research/scholarship,Madison County,-86.568502,34.783368,Not applicable,Implied no,...,81.0,81.0,7.0,1.0,32.0,89.0,89.0,1.0,,
1,100663,University of Alabama at Birmingham,2013,35294-0110,Doctor's degree - research/scholarship and pro...,Jefferson County,-86.80917,33.50223,Not applicable,Implied no,...,36.0,36.0,10.0,0.0,60.0,56.0,55.0,5.0,24136.0,
2,100690,Amridge University,2013,36117-3553,Doctor's degree - research/scholarship and pro...,Montgomery County,-86.17401,32.362609,Churches of Christ,Implied no,...,90.0,90.0,0.0,40.0,90.0,100.0,100.0,0.0,,302.0
3,100706,University of Alabama in Huntsville,2013,35899,Doctor's degree - research/scholarship and pro...,Madison County,-86.63842,34.722818,Not applicable,Yes,...,31.0,31.0,4.0,1.0,63.0,46.0,46.0,3.0,11502.0,
4,100724,Alabama State University,2013,36104-0271,Doctor's degree - research/scholarship and pro...,Montgomery County,-86.295677,32.364317,Not applicable,Implied no,...,76.0,76.0,13.0,11.0,34.0,81.0,81.0,0.0,13202.0,


In [11]:
df['ZipCode'] = df['ZIP code']
df['Longitude'] = df['Longitude location of institution']
df['Latitude'] = df['Latitude location of institution']
df['%1stTimeUgrad-outofstate'] = df['Percent of first-time undergraduates - out-of-state']
df['%1stTimeUgrad-foreign'] = df['Percent of first-time undergraduates - foreign countries']

univ = df[['Name', 'ZipCode', 'Longitude', 'Latitude', '%1stTimeUgrad-outofstate', '%1stTimeUgrad-foreign']]
univ.head()

Unnamed: 0,Name,ZipCode,Longitude,Latitude,%1stTimeUgrad-outofstate,%1stTimeUgrad-foreign
0,Alabama A & M University,35762,-86.568502,34.783368,,
1,University of Alabama at Birmingham,35294-0110,-86.80917,33.50223,13.0,1.0
2,Amridge University,36117-3553,-86.17401,32.362609,,
3,University of Alabama in Huntsville,35899,-86.63842,34.722818,14.0,4.0
4,Alabama State University,36104-0271,-86.295677,32.364317,37.0,4.0


In [12]:
univ2 = univ[univ['%1stTimeUgrad-outofstate'].notnull()]
univ2.head()

Unnamed: 0,Name,ZipCode,Longitude,Latitude,%1stTimeUgrad-outofstate,%1stTimeUgrad-foreign
1,University of Alabama at Birmingham,35294-0110,-86.80917,33.50223,13.0,1.0
3,University of Alabama in Huntsville,35899,-86.63842,34.722818,14.0,4.0
4,Alabama State University,36104-0271,-86.295677,32.364317,37.0,4.0
5,The University of Alabama,35487-0166,-87.545766,33.2144,57.0,3.0
7,Auburn University at Montgomery,36117-3596,-86.177351,32.369939,2.0,0.0


In [13]:
outofstate = univ2[univ2['%1stTimeUgrad-outofstate'] > 50].reset_index(drop=True)
outofstate.head()

Unnamed: 0,Name,ZipCode,Longitude,Latitude,%1stTimeUgrad-outofstate,%1stTimeUgrad-foreign
0,The University of Alabama,35487-0166,-87.545766,33.2144,57.0,3.0
1,Tuskegee University,36088-1920,-85.710315,32.431021,64.0,0.0
2,Embry-Riddle Aeronautical University-Prescott,86301-3720,-112.452285,34.615678,77.0,6.0
3,California Institute of Technology,91125,-118.12574,34.139275,56.0,8.0
4,Pomona College,91711-6319,-117.711944,34.098298,56.0,18.0


In [14]:
import json
import requests
from pandas.io.json import json_normalize

In [19]:
CLIENT_ID = '5JI2S5KQPXVG342F1D3JF1GQD15GXSOG1PXJIHKFNCY34X0K' # your Foursquare ID
CLIENT_SECRET = 'ODLL5FOFURE1JOJ3QTN2AE0HTPQ32DQ3AIRBWY1US4S0YMMR' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('My credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

My credentails:
CLIENT_ID: 5JI2S5KQPXVG342F1D3JF1GQD15GXSOG1PXJIHKFNCY34X0K
CLIENT_SECRET:ODLL5FOFURE1JOJ3QTN2AE0HTPQ32DQ3AIRBWY1US4S0YMMR


In [20]:
radius = 100

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            outofstate['Latitude'][0], 
            outofstate['Longitude'][0], 
            radius, 
            LIMIT)
            
url

'https://api.foursquare.com/v2/venues/explore?&client_id=5JI2S5KQPXVG342F1D3JF1GQD15GXSOG1PXJIHKFNCY34X0K&client_secret=ODLL5FOFURE1JOJ3QTN2AE0HTPQ32DQ3AIRBWY1US4S0YMMR&v=20180605&ll=33.2144,-87.545766&radius=100&limit=100'

In [21]:
LIMIT = 100

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()['response']['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [22]:
UniversityVenues = getNearbyVenues(names=outofstate['Name'],
                                  latitudes=outofstate['Latitude'],
                                  longitudes=outofstate['Longitude'])

The University of Alabama
Tuskegee University
Embry-Riddle Aeronautical University-Prescott
California Institute of Technology
Pomona College
Colorado College
University of Denver
University of Bridgeport
Connecticut College
Fairfield University
University of Hartford
University of New Haven
Quinnipiac University
Sacred Heart University
Trinity College
Wesleyan University
Yale University
Delaware State University
University of Delaware
Gallaudet University
George Washington University
Georgetown University
Howard University
Eckerd College
Embry-Riddle Aeronautical University-Daytona Beach
University of Miami
The University of Tampa
Clark Atlanta University
Emory University
University of Chicago
Trinity International University-Illinois
Wheaton College
DePauw University
Earlham College
University of Notre Dame
Saint Mary's College
Taylor University
Briar Cliff University
Coe College
Cornell College
Dordt College
Drake University
Graceland University-Lamoni
Grinnell College
Loras College

In [25]:
UniversityVenues.shape

(2513, 7)