# 02 - Data from the Web

## Deadline
Wednesday October 25, 2017 at 11:59PM

## Important Notes
* Make sure you push on GitHub your Notebook with all the cells already evaluated (i.e., you don't want your colleagues to generate unnecessary Web traffic during the peer review)
* Don't forget to add a textual description of your thought process, the assumptions you made, and the solution you plan to implement!
* Please write all your comments in English, and use meaningful variable names in your code.

## Background
In this homework we will extract interesting information from www.topuniversities.com and www.timeshighereducation.com, two platforms that maintain a global ranking of worldwide universities. This ranking is not offered as a downloadable dataset, so you will have to find a way to scrape the information we need!
You are not allowed to download manually the entire ranking -- rather you have to understand how the server loads it in your browser. For this task, Postman with the Interceptor extension can help you greatly. We recommend that you watch this [brief tutorial](https://www.youtube.com/watch?v=jBjXVrS8nXs&list=PLM-7VG-sgbtD8qBnGeQM5nvlpqB_ktaLZ&autoplay=1) to understand quickly how to use it.

## Assignment
1. Obtain the 200 top-ranking universities in www.topuniversities.com ([ranking 2018](https://www.topuniversities.com/university-rankings/world-university-rankings/2018)). In particular, extract the following fields for each university: name, rank, country and region, number of faculty members (international and total) and number of students (international and total). Some information is not available in the main list and you have to find them in the [details page](https://www.topuniversities.com/universities/ecole-polytechnique-fédérale-de-lausanne-epfl).
Store the resulting dataset in a pandas DataFrame and answer the following questions:
- Which are the best universities in term of: (a) ratio between faculty members and students, (b) ratio of international students?
- Answer the previous question aggregating the data by (c) country and (d) region.

Plot your data using bar charts and describe briefly what you observed.

2. Obtain the 200 top-ranking universities in www.timeshighereducation.com ([ranking 2018](http://timeshighereducation.com/world-university-rankings/2018/world-ranking)). Repeat the analysis of the previous point and discuss briefly what you observed.

3. Merge the two DataFrames created in questions 1 and 2 using university names. Match universities' names as well as you can, and explain your strategy. Keep track of the original position in both rankings.

4. Find useful insights in the data by performing an exploratory analysis. Can you find a strong correlation between any pair of variables in the dataset you just created? Example: when a university is strong in its international dimension, can you observe a consistency both for students and faculty members?

5. Can you find the best university taking in consideration both rankings? Explain your approach.

Hints:
- Keep your Notebook clean and don't print the verbose output of the requests if this does not add useful information for the reader.
- In case of tie, use the order defined in the webpage.

In [None]:
import 

In [1]:
# Import libraries
import requests
from bs4 import BeautifulSoup
import json
import pandas as pd
import seaborn
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
from IPython.core import display as ICD
%matplotlib inline

# QS RANKING
First we get the following data of each university represented in the ranking a single request:
* name
* rank
* country
* region
* url



Then we convert this information converting it into a DataFrame.

In [38]:
# Make a request
url_main = 'https://www.topuniversities.com'  # Found with postman
r = requests.get(url_main + '/sites/default/files/qs-rankings-data/357051.txt')
print('Response status code: {0}\n'.format(r.status_code))
page_body = r.text

# Serialize the json data with json library
rank_json = json.loads(page_body)

# Converting the data into a pd.DataFrame
rank_df = pd.DataFrame()
rank_df = rank_df.from_dict(rank_json['data']).head(200)
rank_df.stars
rank_df.drop(['logo', 'stars', 'nid','cc', 'score'], axis=1, inplace=True)
rank_df.set_index('core_id', inplace=True)
rank_df = rank_df[['title', 'rank_display', 'country', 'region', 'url']]
rank_df.head()

Response status code: 200



Unnamed: 0_level_0,title,rank_display,country,region,url
core_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
410,Massachusetts Institute of Technology (MIT),1,United States,North America,/universities/massachusetts-institute-technolo...
573,Stanford University,2,United States,North America,/universities/stanford-university
253,Harvard University,3,United States,North America,/universities/harvard-university
94,California Institute of Technology (Caltech),4,United States,North America,/universities/california-institute-technology-...
95,University of Cambridge,5,United Kingdom,Europe,/universities/university-cambridge


Here we define two functions that we'll use _for every university_ more specific data.
Both functions take as argument the dataframe above, and return a second dataframe with the info requested. Such dataframe will be concatenated to the __rank_df__ dataframe above to have the full information avaiable in asingle dataframe

In [4]:
def my_find(html_attributes, new_df_column_name, rank_df):
    ###
    _tag = html_attributes['tag']
    _class = html_attributes['class']
    # _list is a temporary list that will store the values found and then be converted in the df to be returned.
    _list = []
    for url in rank_df.url:
        # for every url contained in rank_df['url'], perform the corresponding request:
        uni_url = requests.get(url_main + url)
        uni_body = uni_url.text
        soup = BeautifulSoup(uni_body, 'html.parser')
        # look for <tag=_tag, class=_class>
        soup1 = soup.find(_tag, class_=_class)
        # if such tag has been found, look then for <tag=_tag, class='number> where the value 
        # we're interested in is stored! otherwise append -99
        if soup1:
            soup2 = soup1.find(_tag, class_='number') 
            # if such tag has been found, append its value to the _list, otherwise append -99
            if soup2:
                _list.append({new_df_column_name: soup2.text})
            else:
                _list.append({new_df_column_name: -99})
        else:
            _list.append({new_df_column_name: -99})
    # convert _list to dataframe and return it
    return pd.DataFrame.from_dict(_list).replace({r'\n': ''}, regex=True).replace({r',': ''}, regex=True).apply(pd.to_numeric).astype(int)

def find_Score_Citations(rank_df):

    # _list is a temporary list that will store the values found and then be converted in the df to be returned.
    _list = []
    for url in rank_df.url:
        # for every url contained in rank_df['url'], perform the corresponding request:
        uni_url = requests.get(url_main + url)
        uni_body = uni_url.text
        soup = BeautifulSoup(uni_body, 'html.parser')
        # look for <tag=_tag, class=_class>
        soup1 = soup.find('ul', class_='score')                      # main tag identifying the collection of all the scores
        if soup1:
            soup2 = soup1.findAll('li', class_='barg pull-left')[2]  # citations are the third tag of this type
            if soup2:
                soup3 = soup2.find('div', class_="text")             # the actual value of "Score Citation"
                if soup3:
                    _list.append({'Score citations': soup3.text})
                else:
                    _list.append({'Score citations': -99})
            else:
                _list.append({'Score citations': -99})
        else:
            _list.append({'Score citations': -99})
    # convert _list to dataframe and return it
    return pd.DataFrame.from_dict(_list)#.replace({r'\n': ''}, regex=True).replace({r',': ''}, regex=True).apply(pd.to_numeric).astype(int)

Here we use the functions defined above to retrieve the informations needed:
* total faculty members 
* international faculty members 
* total students 
* total international students 
 
furthermore we retrieve also

* score citations


since in the future we'll use it to calculate the best university according to both QS and THE ranking.
We also create two more columns calculating:
* faculty member over students ratio 
* international students over students ratio

In [17]:
# defining HTML tag and class attributes that we want to find
tofind = [{'tag':'div', 'class': 'total faculty'}, 
          {'tag':'div', 'class': 'inter faculty'}, 
          {'tag':'div', 'class': 'total student'}, 
          {'tag':'div', 'class': 'total inter'}]

# creating DataFrame with the data found (NaN values set to -99)
details_df = pd.concat([my_find(tofind[0], 'fac_memb_tot', rank_df),
                        my_find(tofind[1], 'fac_memb_int', rank_df),
                        my_find(tofind[2], 'nb_stud_tot', rank_df),
                        my_find(tofind[3], 'nb_stud_int', rank_df),
                        find_Score_Citations(rank_df)], axis=1)


# concatenate the DataFrames into a unique one
details_df.set_index(rank_df.index, inplace=True)
QS_df = pd.concat([rank_df, details_df], axis=1)

# cleaning the unique DataFrame (deleting the = in rank_display)
QS_df.drop(['url'], axis=1, inplace=True)
QS_df.rank_display = QS_df.rank_display.replace({r'=': ''}, regex=True).apply(pd.to_numeric).astype(int)

# Creating the faculty_members_ratio and number_of_students ratio
# the conversion to float is necessary to get the correct result
QS_df['fac_memb_ratio'] = QS_df.fac_memb_tot.astype(float) / QS_df.nb_stud_tot.astype(float)
QS_df['int_stud_ratio'] = QS_df.nb_stud_int.astype(float) / QS_df.nb_stud_tot.astype(float)

# ------CLEANING THE DATAFRAME-----

# Deleting what's useless
del details_df, tofind, rank_df, r, page_body, rank_json

# resetting the index, dropping the 'core_id' column wich is useless
QS_df.reset_index(inplace = True)
QS_df.drop('core_id', axis=1, inplace = True)

# renaming the columns
QS_df.rename(columns = {'title':'name', 'rank_display':'rank'}, inplace = True)

# reordering the columns
QS_df = QS_df[['name', 'rank', 'country', 'region','nb_stud_tot','nb_stud_int', 
               'fac_memb_tot', 'fac_memb_int','fac_memb_ratio', 'int_stud_ratio', 'Score citations']]

# let's have a look at the final QS_df
QS_df.head(20)

# TIMES HIGHER EDUCATION RATING

In [44]:
# Making the request and beautifully-soupping it to obtain the dataframe THE_df
URL = 'https://www.timeshighereducation.com/sites/default/files/the_data_rankings/'\
                +'world_university_rankings_2018_limit0_369a9045a203e176392b9fb8f8c1cb2a.json'
r = requests.get(URL)
print('r = {r} // status_code = {status}'.format(r=r,status=r.status_code))
r.content
soupp = BeautifulSoup(r.content,'html.parser')
rank_json = json.loads(r.text)
THE_df = pd.DataFrame()
THE_df = THE_df.from_dict(rank_json['data']).head(200)

# select columns of our interest and changing their name
THE_df = THE_df[['name', 'rank', 'location', 'stats_number_students', 'stats_student_staff_ratio', 'stats_pc_intl_students','scores_citations']]
THE_df.columns=['name', 'rank','country','nb_stud_tot', 'stats_student_staff_ratio','int_stud_ratio', 'Score citations']
THE_df.head()

r = <Response [200]> // status_code = 200


Unnamed: 0,name,rank,country,nb_stud_tot,stats_student_staff_ratio,int_stud_ratio,Score citations
0,University of Oxford,1,United Kingdom,20409,11.2,38%,99.1
1,University of Cambridge,2,United Kingdom,18389,10.9,35%,97.5
2,California Institute of Technology,=3,United States,2209,6.5,27%,99.5
3,Stanford University,=3,United States,15845,7.5,22%,99.9
4,Massachusetts Institute of Technology,5,United States,11177,8.7,34%,100.0


The dataset obtained need to be cleaned in order to be used. The following, routine operations have been done:
* converting dtypes 'objects' to 'int' or 'float'
* treating special carachters such as '='
* converting percentuals to floats

Furthermore the number of 
* faculty members
* international students
had to be calculated.

Observe that the number of _international faculty members_ for each university is missing in this dataset.

In [45]:
# int_stud_ratio
THE_df['int_stud_ratio'] = THE_df['int_stud_ratio'].str.replace('%', '').astype('double')/100

# nb_stud_tot
THE_df['nb_stud_tot'] = THE_df['nb_stud_tot'].str.replace(',', '').astype('int')

# fac_memb_tot
THE_df['stats_student_staff_ratio']=THE_df['stats_student_staff_ratio'].astype(float)
THE_df['fac_memb_tot'] = THE_df['nb_stud_tot']/THE_df['stats_student_staff_ratio']
THE_df['fac_memb_tot'] = THE_df['fac_memb_tot'].astype(int)

# nb_stud_int
THE_df['nb_stud_int'] = THE_df['nb_stud_tot']*THE_df['int_stud_ratio']
THE_df['nb_stud_int'] = THE_df['nb_stud_int'].astype(int)

# fac_memb_ratio
THE_df['fac_memb_ratio'] = 1./THE_df['stats_student_staff_ratio']
THE_df=THE_df.drop('stats_student_staff_ratio', axis=1)

# rank
THE_df['rank'] = THE_df['rank'].astype(str)
THE_df['rank'] = THE_df['rank'].replace({r'=': ''}, regex=True).apply(pd.to_numeric).astype(int)

# ordering the columns in the same way of the QS_df
THE_df = THE_df[['name','rank','country','nb_stud_tot','nb_stud_int','fac_memb_tot',
                 'fac_memb_ratio','int_stud_ratio','Score citations']]

# let's now have a look at the clean THE_df
THE_df.head(20)

Unnamed: 0,name,rank,country,nb_stud_tot,nb_stud_int,fac_memb_tot,fac_memb_ratio,int_stud_ratio,Score citations
0,University of Oxford,1,United Kingdom,20409,7755,1822,0.089286,0.38,99.1
1,University of Cambridge,2,United Kingdom,18389,6436,1687,0.091743,0.35,97.5
2,California Institute of Technology,3,United States,2209,596,339,0.153846,0.27,99.5
3,Stanford University,3,United States,15845,3485,2112,0.133333,0.22,99.9
4,Massachusetts Institute of Technology,5,United States,11177,3800,1284,0.114943,0.34,100.0
5,Harvard University,6,United States,20326,5284,2283,0.11236,0.26,99.7
6,Princeton University,7,United States,7955,1909,958,0.120482,0.24,99.6
7,Imperial College London,8,United Kingdom,15857,8721,1390,0.087719,0.55,96.7
8,University of Chicago,9,United States,13525,3381,2181,0.16129,0.25,99.4
9,ETH Zurich – Swiss Federal Institute of Techno...,10,Switzerland,19233,7308,1317,0.068493,0.38,94.3


# RETRIEVING THE BEST UNIVERSITIES IN BOTH RANKINGS

Now, we define some utility functions to retrieve the best universities _in both rankings_ by:
* faculty members over students
* international students over total students

Furthermore, we'll answer the same question above but grouping the universities 
* by country
* by region

These utility functions are the following:
* __are_there_nans_QS(df)__ and __are_there_nans_THE(df):__ check wether there are NaN values in the datasets. Two different functions are needed because of the different structure of the datasets.
* __are_there_nans_THE(df):__retrieve the universities with at least one NaN value
* __best(df, field):__ find t he best university in the _field_ passed to the function
* __best_by(df, by, field):__ find the best university grouping the dataframe according to the parameters passed to the function

In [46]:
#------------------------------------
#-------UTILITY FUNCTIONS------------
#------------------------------------

def are_there_nans_QS(df):
    table = df[['nb_stud_tot','nb_stud_int','fac_memb_tot','fac_memb_int','fac_memb_ratio','int_stud_ratio']]<0
    return table.sum()

def are_there_nans_THE(df):
    table = df[['nb_stud_tot','nb_stud_int','fac_memb_tot','fac_memb_ratio','int_stud_ratio']]<0
    return table.sum()

def give_me_nans(df):
    return pd.concat([df.loc[df['nb_stud_tot']<0],
                      df.loc[df['nb_stud_int']<0],
                      df.loc[df['fac_memb_tot']<0],
                      df.loc[df['fac_memb_int']<0],
                      df.loc[df['fac_memb_ratio']<0],
                      df.loc[df['int_stud_ratio']<0]]).drop_duplicates()

def best(df, field):
    return df[df[field]==df[field].max()][['name','rank',field]]

def best_by(df, by, field):
    return df[df.groupby(by)[field].transform(max) ==df[field]].sort_values(by)[[by,'name','rank',field]]

## first, check for missing values in the datasets
In the QS_df there are some missing values, that belong to NYU and IISc, as displayed hereunder. The NYU will be excluded then from the QS_df before starting the research for the best university.
In the THE_df, there are no missing values.

In [48]:
ICD.display(are_there_nans_QS(QS_df))
ICD.display(give_me_nans(QS_df))
are_there_nans_THE(THE_df)

nb_stud_tot       1
nb_stud_int       1
fac_memb_tot      1
fac_memb_int      2
fac_memb_ratio    0
int_stud_ratio    0
dtype: int64

Unnamed: 0,name,rank,country,region,nb_stud_tot,nb_stud_int,fac_memb_tot,fac_memb_int,fac_memb_ratio,int_stud_ratio,Score citations
51,New York University (NYU),52,United States,North America,-99,-99,-99,-99,1.0,1.0,-99
189,Indian Institute of Science (IISc) Bangalore,190,India,Asia,4071,47,423,-99,0.103906,0.011545,100


nb_stud_tot       0
nb_stud_int       0
fac_memb_tot      0
fac_memb_ratio    0
int_stud_ratio    0
dtype: int64

Hereunder we simply retrieve the results:

In [37]:
# --------------QS--------------
print('--------%%%%%%%%-----------------%%%%%%%%%%%%-------------------\n')
print('------%%%%%%%%%%%%-------------%%%%%%%%%%%%%%-------------------\n')
print('-----%%%%------%%%%-----------%%%%------------------------------\n')
print('----%%%%--------%%%%-----------%%%%-----------------------------\n')
print('----%%%%--------%%%%-------------%%%%%%%%-----------------------\n')
print('----%%%%----%%%%-%%%%---------------%%%%%%%%--------------------\n')
print('-----%%%%----%%%%%%---------------------%%%%--------------------\n')
print('------%%%%%%%%%%%%%----------%%%%%%%%%%%%%%%--------------------\n')
print('--------%%%%%%%%%%%%-------%%%%%%%%%%%%%%%%---------------------\n')
print('-----------------%%%%-------------------------------------------\n')

# abs
print('\n\nBest university for faculty members to students ratio:')
ICD.display(   best(QS_df.drop(51),'fac_memb_ratio'))
print('\n\nBest university for international students to students ratio:')
ICD.display(   best(QS_df.drop(51),'int_stud_ratio'))

# groupby region
print('-------------------------------------------------------\n')
print('--------------------GROUP BY REGION--------------------\n')
print('-------------------------------------------------------')

print('\n\nBest university for faculty members to students ratio:')
ICD.display(   best_by(QS_df.drop(51),'region','fac_memb_ratio'))
print('\n\nBest university for international students to students ratio:')
ICD.display(   best_by(QS_df.drop(51),'region','int_stud_ratio'))

# groupby country
print('-------------------------------------------------------\n')
print('--------------------GROUP BY COUNTRY-------------------\n')
print('-------------------------------------------------------')

print('\n\nBest university for faculty members to students ratio:')
ICD.display(   best_by(QS_df.drop(51),'country','fac_memb_ratio'))
print('\n\nBest university for international students to students ratio:')
ICD.display(   best_by(QS_df.drop(51),'country','int_stud_ratio'))


# --------------THE--------------
print('--%%%%%%%%%%%%%%%%%%--%%%%-------%%%%-----%%%%%%%%%%%%%-------\n')
print('--%%%%%%%%%%%%%%%%%%--%%%%-------%%%%-----%%%%%%%%%%%%%-------\n')
print('--%%%%%%%%%%%%%%%%%%--%%%%-------%%%%-----%%%%%%%%%%%%%-------\n')
print('---------%%%%---------%%%%-------%%%%-----%%%%----------------\n')
print('---------%%%%---------%%%%%%%%%%%%%%%-----%%%%%%%%%%%%%-------\n')
print('---------%%%%---------%%%%%%%%%%%%%%%-----%%%%%%%%%%%%%-------\n')
print('---------%%%%---------%%%%%%%%%%%%%%%-----%%%%----------------\n')
print('---------%%%%---------%%%%-------%%%%-----%%%%----------------\n')
print('---------%%%%---------%%%%-------%%%%-----%%%%%%%%%%%%%-------\n')
print('---------%%%%---------%%%%-------%%%%-----%%%%%%%%%%%%%-------\n')
print('---------%%%%---------%%%%-------%%%%-----%%%%%%%%%%%%%-------\n')


# abs
print('\n\nBest university for faculty members to students ratio:')
ICD.display(   best(THE_df,'fac_memb_ratio'))
print('\n\nBest university for international students to students ratio:')
ICD.display(   best(THE_df,'int_stud_ratio'))

# groupby country
print('-------------------------------------------------------\n')
print('--------------------GROUP BY COUNTRY-------------------\n')
print('-------------------------------------------------------')

print('\n\nBest university for faculty members to students ratio:')
ICD.display(   best_by(THE_df,'country','fac_memb_ratio'))
print('\n\nBest university for international students to students ratio:')
ICD.display(   best_by(THE_df,'country','int_stud_ratio'))

--------%%%%%%%%-----------------%%%%%%%%%%%%-------------------

------%%%%%%%%%%%%-------------%%%%%%%%%%%%%%-------------------

-----%%%%------%%%%-----------%%%%------------------------------

----%%%%--------%%%%-----------%%%%-----------------------------

----%%%%--------%%%%-------------%%%%%%%%-----------------------

----%%%%----%%%%-%%%%---------------%%%%%%%%--------------------

-----%%%%----%%%%%%---------------------%%%%--------------------

------%%%%%%%%%%%%%----------%%%%%%%%%%%%%%%--------------------

--------%%%%%%%%%%%%-------%%%%%%%%%%%%%%%%---------------------

-----------------%%%%-------------------------------------------



Best university for faculty members to students ratio:


Unnamed: 0,name,rank,fac_memb_ratio
3,California Institute of Technology (Caltech),4,0.422616




Best university for international students to students ratio:


Unnamed: 0,name,rank,int_stud_ratio
34,London School of Economics and Political Scien...,35,0.691393


-------------------------------------------------------

--------------------GROUP BY REGION--------------------

-------------------------------------------------------


Best university for faculty members to students ratio:


Unnamed: 0,region,name,rank,fac_memb_ratio
190,Africa,University of Cape Town,191,0.08845
70,Asia,Pohang University of Science And Technology (P...,71,0.213025
5,Europe,University of Oxford,6,0.342292
197,Latin America,Instituto Tecnológico y de Estudios Superiores...,199,0.136214
3,North America,California Institute of Technology (Caltech),4,0.422616
19,Oceania,The Australian National University,20,0.110788




Best university for international students to students ratio:


Unnamed: 0,region,name,rank,int_stud_ratio
190,Africa,University of Cape Town,191,0.169703
25,Asia,The University of Hong Kong,26,0.407144
34,Europe,London School of Economics and Political Scien...,35,0.691393
74,Latin America,Universidad de Buenos Aires (UBA),75,0.221658
47,North America,Carnegie Mellon University,47,0.478062
41,Oceania,The University of Melbourne,41,0.427434


-------------------------------------------------------

--------------------GROUP BY COUNTRY-------------------

-------------------------------------------------------


Best university for faculty members to students ratio:


Unnamed: 0,country,name,rank,fac_memb_ratio
74,Argentina,Universidad de Buenos Aires (UBA),75,0.134267
19,Australia,The Australian National University,20,0.110788
153,Austria,University of Vienna,154,0.074205
181,Belgium,Vrije Universiteit Brussel (VUB),182,0.19302
120,Brazil,Universidade de São Paulo,121,0.084948
139,Canada,McMaster University,140,0.136318
137,Chile,Pontificia Universidad Católica de Chile (UC),137,0.083694
24,China,Tsinghua University,25,0.15168
116,Denmark,Technical University of Denmark,116,0.238455
102,Finland,University of Helsinki,102,0.11798




Best university for international students to students ratio:


Unnamed: 0,country,name,rank,int_stud_ratio
74,Argentina,Universidad de Buenos Aires (UBA),75,0.221658
41,Australia,The University of Melbourne,41,0.427434
153,Austria,University of Vienna,154,0.314748
181,Belgium,Vrije Universiteit Brussel (VUB),182,0.199591
182,Brazil,Universidade Estadual de Campinas (Unicamp),182,0.036354
31,Canada,McGill University,32,0.330825
199,Chile,Universidad de Chile,201,0.054932
37,China,Peking University,38,0.168265
116,Denmark,Technical University of Denmark,116,0.236314
138,Finland,Aalto University,137,0.150737


--%%%%%%%%%%%%%%%%%%--%%%%-------%%%%-----%%%%%%%%%%%%%-------

--%%%%%%%%%%%%%%%%%%--%%%%-------%%%%-----%%%%%%%%%%%%%-------

--%%%%%%%%%%%%%%%%%%--%%%%-------%%%%-----%%%%%%%%%%%%%-------

---------%%%%---------%%%%-------%%%%-----%%%%----------------

---------%%%%---------%%%%%%%%%%%%%%%-----%%%%%%%%%%%%%-------

---------%%%%---------%%%%%%%%%%%%%%%-----%%%%%%%%%%%%%-------

---------%%%%---------%%%%%%%%%%%%%%%-----%%%%----------------

---------%%%%---------%%%%-------%%%%-----%%%%----------------

---------%%%%---------%%%%-------%%%%-----%%%%%%%%%%%%%-------

---------%%%%---------%%%%-------%%%%-----%%%%%%%%%%%%%-------

---------%%%%---------%%%%-------%%%%-----%%%%%%%%%%%%%-------



Best university for faculty members to students ratio:


Unnamed: 0,name,rank,fac_memb_ratio
105,Vanderbilt University,105,0.30303




Best university for international students to students ratio:


Unnamed: 0,name,rank,int_stud_ratio
24,London School of Economics and Political Science,25,0.71


-------------------------------------------------------

--------------------GROUP BY COUNTRY-------------------

-------------------------------------------------------


Best university for faculty members to students ratio:


Unnamed: 0,country,name,rank,fac_memb_ratio
47,Australia,Australian National University,48,0.051813
164,Austria,University of Vienna,165,0.048077
106,Belgium,Ghent University,107,0.027855
41,Canada,McGill University,42,0.075188
131,China,University of Science and Technology of China,132,0.121951
109,Denmark,University of Copenhagen,109,0.243902
89,Finland,University of Helsinki,90,0.061728
114,France,École Polytechnique,115,0.196078
34,Germany,LMU Munich,34,0.064103
119,Hong Kong,City University of Hong Kong,119,0.089286




Best university for international students to students ratio:


Unnamed: 0,country,name,rank,int_stud_ratio
31,Australia,University of Melbourne,32,0.4
164,Austria,University of Vienna,165,0.26
174,Belgium,Université Libre de Bruxelles,175,0.35
33,Canada,University of British Columbia,34,0.29
28,China,Peking University,27,0.16
152,Denmark,Technical University of Denmark,153,0.24
189,Finland,Aalto University,190,0.2
114,France,École Polytechnique,115,0.36
40,Germany,Technical University of Munich,41,0.23
39,Hong Kong,University of Hong Kong,40,0.42
