<h1 align=center><font size = 6>Clustering World Top 50 Universities Based on Venues</font></h1>

A description of the problem and a discussion of the background. (15 marks)
A description of the data and how it will be used to solve the problem. (15 marks)

## Table of contents
* ### [Introduction: Undiscorvered Rankings](#introduction)
* ### [Data](#data)
* ### [1.Prepare the data ](#Prepare_the_data)
* ### [2.Explore universities around global](#Explore)
* ### [3.Cluster Universities](#cluster)
* ### [4. Analysis to each cluster ](#analysis)
* ### [Conclusion](#conclusion)


## Introduction:  Undiscorvered Rankings <a name="introduction"></a>


For the world's top universities, different rankings have their own basis. For example, there are some most prestigious global rankings： 

**QS World University Rankings**
<img src="https://www.hse.ru/data/2018/06/06/1149772299/2qs2017-092700%20(3).jpg" width="200" />


**Times Higher Education World University Rankings**
<img src="https://www.timeshighereducation.com/sites/default/themes/custom/the_responsive/img/social/ranking-dataset-share.jpg
" alt="drawing" width="200" >

Each of then score universities from their publication, peer review from scholars and academics and the ratio of faculty/students. Often we also see some not so serious rankings of universities on the aspect of campus lifes such as canteens, dormitories and entertainments. But most of these rankings are for universities in a certain region. As for me, I haven't seen a ranking list based on the surrounding environment of the university. That's the reason I want to explore the topic: clustering universities based on surrounding venues.

As a traveler, every time I visit a new city, I will go to a local university to see it. I think a university in a city will not only reflect the spirit of the city, but also express the history of the city and even the country. University students and faculty members also usually arrange their daily activities within a certain distance near the university. It can be said that the distribution of different types of places near the university is affected by the combination with the spirit of the university and the preference of people and vice versa.


## Data <a name="data"></a>

Different university rankings belong to business information. Fortunately, they have similar choices for the best universities in the world. So we get the **Times Higher Education World University Rankings** which published on the wiki as our universities list.
<img src="https://github.com/DwayneLi/Coursera_Capstone/blob/master/universities.png?raw=true" alt="drawing" width="800" />



Then we use google map to find the raletive location of each university, and we analyze the university's surrounding venues via FourSquare api. After all, we use k-means algorithm to define the difference and similarity among universities.

## 1.Prepare the data <a name="Prepare_the_data"></a>

#### Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [134]:
!pip install wikipedia
import pandas as pd
import numpy as np
import wikipedia as wp
!pip install geocoder
import geocoder # import geocoder
!pip install geopy
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

!pip install folium
import folium # map rendering library

import requests
from pandas.io.json import json_normalize# tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans



#### Find the table we need from wikipedia
<img src="https://github.com/DwayneLi/Coursera_Capstone/blob/master/universities.png?raw=true" alt="drawing" width="800" align='left'/>


In [180]:
html = wp.page("Times Higher Education World University Rankings").html().encode("UTF-16")#Get the html source

col_list=[]
for ind in range(len(pd.read_html(html))):
    if 'Institution' in pd.read_html(html)[ind].columns:
        col_list.append(ind)
print('The index of table contain "Institution" are {}.'.format(col_list))

The index of table contain "Institution" are [2, 3, 4, 5].


#### Get the table from wikipedia via packages.

In [181]:
df = pd.read_html(html)[2]
df.head()

Unnamed: 0,Institution,2010–11[42],2011–12[43],2012–13[44],2013–14[45],2014–15[46],2015–16[47],2016–17[48],2017-18[49],2018–19[50],2019–20[51]
0,University of Oxford,6,4,2,2,3,2,1,1,1,1
1,California Institute of Technology,2,1,1,1,1,1,2,3,5,2
2,University of Cambridge,6,6,7,7,5,4,4,2,2,3
3,Stanford University,4,2,3,4,4,3,3,3,3,4
4,Massachusetts Institute of Technology,3,7,5,5,6,5,5,5,4,5


#### Select institutions and combined with location from geolocatoer

In [187]:
lat=[]
lon=[]
names=[]
geolocator = Nominatim(user_agent="institution_explorer")

# get the univerities' latitude and longitude from geolocate
for name in df['Institution']:
    location = geolocator.geocode(name)
    if location is None:
        continue
    names.append(name)
    lat.append(location.latitude)
    lon.append(location.longitude)

# creat universities name with location dataframe
uni_data=pd.DataFrame(list(zip(names,lat,lon)))  
uni_data.columns=['University','Latitude','Longitude']
uni_data.head()

Unnamed: 0,University,Latitude,Longitude
0,University of Oxford,51.758708,-1.255668
1,California Institute of Technology,34.137102,-118.125275
2,University of Cambridge,52.199852,0.119739
3,Stanford University,37.431314,-122.169365
4,Massachusetts Institute of Technology,42.358396,-71.095678


#### Create a map of Universities .

In [188]:
# create map of Universities using latitude and longitude values
universities = folium.Map(location=[uni_data.iloc[0,1], uni_data.iloc[0,2]], zoom_start=2.5, tiles='OpenStreetMap')

# add markers to map
for lat, lng,  university in zip(uni_data['Latitude'], uni_data['Longitude'], uni_data['University']):
    label = '{}'.format(university)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(universities)  
    
universities

### Unclustered universities
<img src="https://github.com/DwayneLi/Coursera_Capstone/blob/master/uncluster_universities.png?raw=true" alt="drawing" width="800" />


## 2. Explore universities around global.<a name="Explore"></a>

#### Define Foursquare Credentials and Version

In [142]:
CLIENT_ID = 'IH0PER1TCXPOAXOF0VWEBBKRZEUQIQJ2XURZVZ1ABPMC2QGN' # your Foursquare ID
CLIENT_SECRET = '0ZPQ5VDVU4MBO1LXLLQ1SMWJTBY2FQJEJPWUWCRQEQNZIRDG' # your Foursquare Secret
VERSION = '20200615' # Foursquare API version


#### Create a function to repeat the same process to all the neighborhoods in Toronto


In [189]:
def getNearbyVenues(names, latitudes, longitudes, radius=3000,LIMIT=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print('Retrive ',name,' near by venues')
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Universities', 
                  'Universities Latitude', 
                  'Universities Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Catch venues of each neighboorhood and create a new dataframe called *Toronto_venues*.

#### Because the API is not very stable, we divided the data into five groups to run separately and concatenate them later.

In [190]:
university_venues_g1 = getNearbyVenues(names=uni_data['University'][0:11],
                                   latitudes=uni_data['Latitude'][0:11],
                                   longitudes=uni_data['Longitude'][0:11]
                                  )

Retrive  University of Oxford  near by venues
Retrive  California Institute of Technology  near by venues
Retrive  University of Cambridge  near by venues
Retrive  Stanford University  near by venues
Retrive  Massachusetts Institute of Technology  near by venues
Retrive  Princeton University  near by venues
Retrive  Harvard University  near by venues
Retrive  Yale University  near by venues
Retrive  University of Chicago  near by venues
Retrive  Imperial College London  near by venues
Retrive  University of Pennsylvania  near by venues


In [191]:
university_venues_g2 = getNearbyVenues(names=uni_data['University'][11:21],
                                   latitudes=uni_data['Latitude'][11:21],
                                   longitudes=uni_data['Longitude'][11:21]
                                  )

Retrive  Johns Hopkins University  near by venues
Retrive  University of California, Berkeley  near by venues
Retrive  University College London  near by venues
Retrive  Columbia University  near by venues
Retrive  University of California, Los Angeles  near by venues
Retrive  University of Toronto  near by venues
Retrive  Cornell University  near by venues
Retrive  Duke University  near by venues
Retrive  University of Michigan  near by venues
Retrive  Northwestern University  near by venues


In [196]:
university_venues_g3 = getNearbyVenues(names=uni_data['University'][21:31],
                                   latitudes=uni_data['Latitude'][21:31],
                                   longitudes=uni_data['Longitude'][21:31]
                                  )


Retrive  Tsinghua University  near by venues
Retrive  Peking University  near by venues
Retrive  National University of Singapore  near by venues
Retrive  University of Washington  near by venues
Retrive  Carnegie Mellon University  near by venues
Retrive  London School of Economics and Political Science  near by venues
Retrive  New York University (NYU)  near by venues
Retrive  University of Edinburgh  near by venues
Retrive  University of California, San Diego  near by venues
Retrive  University of Melbourne  near by venues


In [197]:
university_venues_g4 = getNearbyVenues(names=uni_data['University'][31:41],
                                   latitudes=uni_data['Latitude'][31:41],
                                   longitudes=uni_data['Longitude'][31:41]
                                  )

Retrive  University of British Columbia  near by venues
Retrive  University of Hong Kong  near by venues
Retrive  King's College London  near by venues
Retrive  University of Tokyo  near by venues
Retrive  École Polytechnique Fédérale de Lausanne  near by venues
Retrive  Georgia Institute of Technology  near by venues
Retrive  University of Texas at Austin  near by venues
Retrive  McGill University  near by venues
Retrive  Technical University of Munich  near by venues
Retrive  Heidelberg University  near by venues


In [198]:
university_venues_g5 = getNearbyVenues(names=uni_data['University'][41:],
                                   latitudes=uni_data['Latitude'][41:],
                                   longitudes=uni_data['Longitude'][41:]
                                  )

Retrive  Hong Kong University of Science and Technology  near by venues
Retrive  University of Illinois at Urbana–Champaign  near by venues
Retrive  Nanyang Technological University  near by venues
Retrive  Australian National University  near by venues


#### Check the priorities of our final data.

In [199]:
groups=[university_venues_g1,university_venues_g2,university_venues_g3,university_venues_g4,university_venues_g5]
for i in range(len(groups)):
    print('Group',i+1,'has a length of ',len(groups[i]))
university_venues=pd.concat(groups)
print('Final data set have a total length: ',len(university_venues))

Group 1 has a length of  1100
Group 2 has a length of  1000
Group 3 has a length of  1000
Group 4 has a length of  847
Group 5 has a length of  326
Final data set have a total length:  4273


In [200]:
print(university_venues.shape)
university_venues.head()

(4273, 7)


Unnamed: 0,Universities,Universities Latitude,Universities Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,University of Oxford,51.758708,-1.255668,The University Parks,51.760633,-1.256281,Park
1,University of Oxford,51.758708,-1.255668,Oxford University Museum of Natural History,51.75869,-1.255595,History Museum
2,University of Oxford,51.758708,-1.255668,Old Parsonage Hotel,51.759477,-1.260088,Hotel
3,University of Oxford,51.758708,-1.255668,Blackwell's,51.754635,-1.255517,Bookstore
4,University of Oxford,51.758708,-1.255668,Pitt Rivers Museum,51.758806,-1.255379,History Museum


Save the data frame to csv for future data exploration.

In [201]:
university_venues.to_csv('university_venues.csv')

#### Let's check how many venues were returned for each university.

In [202]:
university_venues.groupby('Universities').count().head()

Unnamed: 0_level_0,Universities Latitude,Universities Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Universities,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Australian National University,76,76,76,76,76,76
California Institute of Technology,100,100,100,100,100,100
Carnegie Mellon University,100,100,100,100,100,100
Columbia University,100,100,100,100,100,100
Cornell University,100,100,100,100,100,100


#### We can see roughly every university has 100 venues. Let's find out how many unique categories can be curated from all the returned venues

In [203]:
print('There are {} uniques categories.'.format(len(university_venues['Venue Category'].unique())))

There are 378 uniques categories.


## 3. Cluster Universities<a name="cluster"></a>

#### Creat dummy variables 

In [204]:
university_onehot = pd.get_dummies(university_venues[['Venue Category']], prefix="", prefix_sep="")
university_onehot=pd.concat([university_venues['Universities'] ,university_onehot],axis=1)
university_onehot.head()

Unnamed: 0,Universities,Accessories Store,Afghan Restaurant,African Restaurant,Alternative Healer,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Aquarium,...,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Yunnan Restaurant,Zoo
0,University of Oxford,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,University of Oxford,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,University of Oxford,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,University of Oxford,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,University of Oxford,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [205]:
university_onehot.shape

(4273, 379)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [206]:
university_grouped = university_onehot.groupby('Universities').mean().reset_index()
university_grouped.head()

Unnamed: 0,Universities,Accessories Store,Afghan Restaurant,African Restaurant,Alternative Healer,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Aquarium,...,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Yunnan Restaurant,Zoo
0,Australian National University,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.013158,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,California Institute of Technology,0.0,0.0,0.0,0.0,0.04,0.0,0.01,0.0,0.0,...,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0
2,Carnegie Mellon University,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Columbia University,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,...,0.0,0.0,0.02,0.03,0.0,0.0,0.0,0.02,0.0,0.0
4,Cornell University,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0


#### Let's confirm the new size

In [207]:
university_grouped.shape

(45, 379)

#### Let's print 3 neighborhoods along with the top 5 most common venues

In [208]:
num_top_venues = 5

for hood in university_grouped['Universities'][0:3]:
    print("----"+hood+"----")
    temp = university_grouped[university_grouped['Universities'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 3})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Australian National University----
            venue   freq
0     Coffee Shop  0.105
1            Park  0.105
2            Café  0.092
3  History Museum  0.039
4           Hotel  0.026


----California Institute of Technology----
          venue  freq
0   Pizza Place  0.06
1  Burger Joint  0.05
2        Garden  0.05
3          Café  0.04
4   Coffee Shop  0.04


----Carnegie Mellon University----
                     venue  freq
0              Coffee Shop  0.08
1              Pizza Place  0.06
2  New American Restaurant  0.05
3       Mexican Restaurant  0.04
4           Ice Cream Shop  0.04




#### Sort the venues in descending order

In [209]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#### Create the new dataframe and display the top 10 venues for each neighborhood.

In [210]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Universities']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
university_venues_sorted = pd.DataFrame(columns=columns)
university_venues_sorted['Universities'] = university_grouped['Universities']

for ind in np.arange(university_grouped.shape[0]):
    university_venues_sorted.iloc[ind, 1:] = return_most_common_venues(university_grouped.iloc[ind, :], num_top_venues)

university_venues_sorted.head()

Unnamed: 0,Universities,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Australian National University,Park,Coffee Shop,Café,History Museum,Pub,Restaurant,Hotel,Art Gallery,Wine Bar,Monument / Landmark
1,California Institute of Technology,Pizza Place,Garden,Burger Joint,Mexican Restaurant,American Restaurant,Coffee Shop,Gym / Fitness Center,Café,Mediterranean Restaurant,Bookstore
2,Carnegie Mellon University,Coffee Shop,Pizza Place,New American Restaurant,Ice Cream Shop,Mexican Restaurant,Bar,Park,American Restaurant,Sandwich Place,Burger Joint
3,Columbia University,Park,Coffee Shop,Southern / Soul Food Restaurant,Italian Restaurant,Pizza Place,Wine Shop,Ice Cream Shop,Spa,Farmers Market,Cocktail Bar
4,Cornell University,American Restaurant,Sandwich Place,Pizza Place,Coffee Shop,Park,Thai Restaurant,Café,Bagel Shop,Ice Cream Shop,Diner


#### Built the K-Means model

In [238]:
# set number of clusters
kclusters = 6

university_grouped_clustering = university_grouped.drop('Universities', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(university_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 0, 0, 3, 0, 3, 3, 3, 0, 1])

#### Add labels to vanues data

In [239]:
uni_data.columns=['Universities','Latitude','Longitude']
uni_data.head()

Unnamed: 0,Universities,Latitude,Longitude
0,University of Oxford,51.758708,-1.255668
1,California Institute of Technology,34.137102,-118.125275
2,University of Cambridge,52.199852,0.119739
3,Stanford University,37.431314,-122.169365
4,Massachusetts Institute of Technology,42.358396,-71.095678


In [242]:
#university_venues_sorted.drop(columns=['Cluster Labels'],inplace=True)

# add clustering labels
university_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

university_merged = uni_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
university_merged = university_merged.join(university_venues_sorted.set_index('Universities'), on='Universities')

university_merged # check the last columns!

Unnamed: 0,Universities,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,University of Oxford,51.758708,-1.255668,2,Pub,Coffee Shop,Café,Bakery,Movie Theater,Thai Restaurant,Restaurant,Ice Cream Shop,Sandwich Place,Chinese Restaurant
1,California Institute of Technology,34.137102,-118.125275,0,Pizza Place,Garden,Burger Joint,Mexican Restaurant,American Restaurant,Coffee Shop,Gym / Fitness Center,Café,Mediterranean Restaurant,Bookstore
2,University of Cambridge,52.199852,0.119739,2,Pub,Café,Coffee Shop,Gastropub,Burger Joint,Park,Indian Restaurant,Grocery Store,Bakery,Deli / Bodega
3,Stanford University,37.431314,-122.169365,3,Park,Gym / Fitness Center,Coffee Shop,Ice Cream Shop,Café,Art Museum,French Restaurant,Steakhouse,New American Restaurant,Furniture / Home Store
4,Massachusetts Institute of Technology,42.358396,-71.095678,3,Bakery,Coffee Shop,Seafood Restaurant,Salad Place,Pizza Place,New American Restaurant,Spa,Clothing Store,Mediterranean Restaurant,Italian Restaurant
5,Princeton University,40.338675,-74.658365,0,Pizza Place,Clothing Store,Hotel,Rental Car Location,Park,Coffee Shop,Ice Cream Shop,Trail,Mexican Restaurant,Lingerie Store
6,Harvard University,42.367909,-71.126782,3,Bakery,Pizza Place,Café,Park,Grocery Store,Ice Cream Shop,Pub,Vegetarian / Vegan Restaurant,Bookstore,Seafood Restaurant
7,Yale University,41.257131,-72.98967,0,Fast Food Restaurant,American Restaurant,Mexican Restaurant,Grocery Store,Pizza Place,Deli / Bodega,Gym / Fitness Center,Beach,Coffee Shop,Pharmacy
8,University of Chicago,41.785447,-87.593879,3,Park,Pizza Place,Grocery Store,Coffee Shop,Lounge,Bank,Art Gallery,History Museum,Bookstore,Café
9,Imperial College London,51.498871,-0.175608,5,Hotel,Garden,Café,Ice Cream Shop,Plaza,Park,Pub,Coffee Shop,Monument / Landmark,Restaurant


## 5. Analysis to each cluster <a name="analysis"></a>

###  Plot the clusters on map

In [244]:
# create map
map_clusters = folium.Map(location=[university_merged.iloc[0,1], university_merged.iloc[0,2]], zoom_start=2.5)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(university_merged['Latitude'], university_merged['Longitude'], university_merged['Universities'], university_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters


### Clustered universities
<img src="https://github.com/DwayneLi/Coursera_Capstone/blob/master/clustered_universities.png?raw=true" alt="drawing" width="800" />

### Cluster 1  --- Pizza

We can see 'Pizza Place' 'Coffee Shop' have high rank in venues in cluster 1.

In [245]:
university_merged.loc[university_merged['Cluster Labels'] == 0, university_merged.columns[[0] + list(range(4, university_merged.shape[1]))]]

Unnamed: 0,Universities,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,California Institute of Technology,Pizza Place,Garden,Burger Joint,Mexican Restaurant,American Restaurant,Coffee Shop,Gym / Fitness Center,Café,Mediterranean Restaurant,Bookstore
5,Princeton University,Pizza Place,Clothing Store,Hotel,Rental Car Location,Park,Coffee Shop,Ice Cream Shop,Trail,Mexican Restaurant,Lingerie Store
7,Yale University,Fast Food Restaurant,American Restaurant,Mexican Restaurant,Grocery Store,Pizza Place,Deli / Bodega,Gym / Fitness Center,Beach,Coffee Shop,Pharmacy
11,Johns Hopkins University,Pizza Place,Sushi Restaurant,Coffee Shop,Mexican Restaurant,Sandwich Place,Hotel,American Restaurant,Bakery,Ice Cream Shop,Asian Restaurant
12,"University of California, Berkeley",Pizza Place,Coffee Shop,Café,Theater,Park,Scenic Lookout,Japanese Restaurant,Salad Place,Ice Cream Shop,Indian Restaurant
17,Cornell University,American Restaurant,Sandwich Place,Pizza Place,Coffee Shop,Park,Thai Restaurant,Café,Bagel Shop,Ice Cream Shop,Diner
19,University of Michigan,Sandwich Place,Korean Restaurant,Park,Coffee Shop,Indian Restaurant,Gym / Fitness Center,Sushi Restaurant,Chinese Restaurant,Pizza Place,Nature Preserve
25,Carnegie Mellon University,Coffee Shop,Pizza Place,New American Restaurant,Ice Cream Shop,Mexican Restaurant,Bar,Park,American Restaurant,Sandwich Place,Burger Joint
40,Heidelberg University,Pizza Place,Bar,Convenience Store,Coffee Shop,Ice Cream Shop,Gas Station,Asian Restaurant,Theater,Bank,Office
42,University of Illinois at Urbana–Champaign,Coffee Shop,Bar,Mexican Restaurant,Park,Ice Cream Shop,Pizza Place,Chinese Restaurant,Bakery,Hotel,Sandwich Place


### Cluster 2   --- Chinese Restaurant

'Chinese Restaurant' counts the main part of venues in cluster 2.

In [246]:
university_merged.loc[university_merged['Cluster Labels'] == 1, university_merged.columns[[0] + list(range(4, university_merged.shape[1]))]]

Unnamed: 0,Universities,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,Tsinghua University,Chinese Restaurant,Café,Coffee Shop,Fast Food Restaurant,Sandwich Place,Pizza Place,Bar,Bakery,Korean Restaurant,Clothing Store
22,Peking University,Chinese Restaurant,Fast Food Restaurant,Pizza Place,Sandwich Place,Historic Site,Café,Coffee Shop,Korean Restaurant,Bar,Hotel
41,Hong Kong University of Science and Technology,Fast Food Restaurant,Chinese Restaurant,Cantonese Restaurant,Coffee Shop,Shopping Mall,Cha Chaan Teng,Clothing Store,Park,Beach,Hong Kong Restaurant
43,Nanyang Technological University,Fast Food Restaurant,Coffee Shop,Food Court,Japanese Restaurant,Chinese Restaurant,Asian Restaurant,Shopping Mall,Sandwich Place,Supermarket,Dessert Shop


### Cluster 3  --- Pub/Cafe/Park/Coffee Shop

From the names, we can found that these universities  are located in the United Kingdom or the original Commonwealth region.

And the 'Pub' and 'Cafe' definitely stand for its characterastic of cluster 3. Coffee shop and park also show a lot in the top 3 most common venues.

In [247]:
university_merged.loc[university_merged['Cluster Labels'] == 2, university_merged.columns[[0] + list(range(4, university_merged.shape[1]))]]

Unnamed: 0,Universities,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,University of Oxford,Pub,Coffee Shop,Café,Bakery,Movie Theater,Thai Restaurant,Restaurant,Ice Cream Shop,Sandwich Place,Chinese Restaurant
2,University of Cambridge,Pub,Café,Coffee Shop,Gastropub,Burger Joint,Park,Indian Restaurant,Grocery Store,Bakery,Deli / Bodega
28,University of Edinburgh,Café,Pub,Park,Coffee Shop,Hotel,Cocktail Bar,Beer Bar,Restaurant,Art Gallery,Sandwich Place
30,University of Melbourne,Café,Coffee Shop,Wine Bar,Bar,Deli / Bodega,Vegetarian / Vegan Restaurant,Cocktail Bar,Japanese Restaurant,Chinese Restaurant,Bookstore
31,University of British Columbia,Coffee Shop,Park,Beach,Garden,Café,Scenic Lookout,Restaurant,Frozen Yogurt Shop,Botanical Garden,Deli / Bodega
44,Australian National University,Park,Coffee Shop,Café,History Museum,Pub,Restaurant,Hotel,Art Gallery,Wine Bar,Monument / Landmark


### Cluster 4  --- Delicious food around world

We can see there no domain venue category can select from cluster 4. But diverse restaurants can

In [248]:
university_merged.loc[university_merged['Cluster Labels'] == 3, university_merged.columns[[0] + list(range(4, university_merged.shape[1]))]]

Unnamed: 0,Universities,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Stanford University,Park,Gym / Fitness Center,Coffee Shop,Ice Cream Shop,Café,Art Museum,French Restaurant,Steakhouse,New American Restaurant,Furniture / Home Store
4,Massachusetts Institute of Technology,Bakery,Coffee Shop,Seafood Restaurant,Salad Place,Pizza Place,New American Restaurant,Spa,Clothing Store,Mediterranean Restaurant,Italian Restaurant
6,Harvard University,Bakery,Pizza Place,Café,Park,Grocery Store,Ice Cream Shop,Pub,Vegetarian / Vegan Restaurant,Bookstore,Seafood Restaurant
8,University of Chicago,Park,Pizza Place,Grocery Store,Coffee Shop,Lounge,Bank,Art Gallery,History Museum,Bookstore,Café
10,University of Pennsylvania,Coffee Shop,Park,American Restaurant,Trail,Grocery Store,Art Museum,Pizza Place,Hotel,Ice Cream Shop,Outdoor Sculpture
14,Columbia University,Park,Coffee Shop,Southern / Soul Food Restaurant,Italian Restaurant,Pizza Place,Wine Shop,Ice Cream Shop,Spa,Farmers Market,Cocktail Bar
15,"University of California, Los Angeles",Coffee Shop,Italian Restaurant,Gym,Grocery Store,Hotel,Mediterranean Restaurant,Sandwich Place,Ice Cream Shop,Sushi Restaurant,Garden
16,University of Toronto,Coffee Shop,Bakery,Park,Sandwich Place,Café,Dessert Shop,Bookstore,Dance Studio,Caribbean Restaurant,French Restaurant
18,Duke University,Sandwich Place,Hotel,Mexican Restaurant,Gastropub,Café,Coffee Shop,Burger Joint,BBQ Joint,Bakery,Mediterranean Restaurant
20,Northwestern University,Coffee Shop,Beach,American Restaurant,Pizza Place,Bakery,Gym,Sushi Restaurant,Mexican Restaurant,Park,Brewery


### Cluster 5 ---Tokyo  One must be like a team

In [249]:
university_merged.loc[university_merged['Cluster Labels'] == 4, university_merged.columns[[0] + list(range(4, university_merged.shape[1]))]]

Unnamed: 0,Universities,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
34,University of Tokyo,Convenience Store,Drugstore,Ramen Restaurant,Coffee Shop,Discount Store,Supermarket,Restaurant,Japanese Restaurant,Italian Restaurant,Shopping Mall


### Cluster 6  Art, Movie, Coffee and ...... Hotel!
The universities in London are classified as a separate group, probably because they are too close together. Another explaination is the venues distribution shows that London is rich in tourism resources( numerous hotels) and art resource. We can conclude that number one common venue is the hotel and followed by theater, coffee shop and etc.

In [250]:
university_merged.loc[university_merged['Cluster Labels'] == 5, university_merged.columns[[0] + list(range(4, university_merged.shape[1]))]]

Unnamed: 0,Universities,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,Imperial College London,Hotel,Garden,Café,Ice Cream Shop,Plaza,Park,Pub,Coffee Shop,Monument / Landmark,Restaurant
13,University College London,Hotel,Coffee Shop,Grocery Store,French Restaurant,Ice Cream Shop,Theater,Bookstore,Pedestrian Plaza,Plaza,Movie Theater
26,London School of Economics and Political Science,Hotel,Theater,Coffee Shop,Plaza,Grocery Store,Art Gallery,Ice Cream Shop,History Museum,Art Museum,Liquor Store
33,King's College London,Hotel,Coffee Shop,Theater,Grocery Store,Cocktail Bar,History Museum,Plaza,Bakery,Pedestrian Plaza,Performing Arts Venue
