# EXPLORATORY DATA ANALYSIS CASE STUDY 
## ABOUT DATASET
### DATASET: World University Ranking (2022 - 2023)
### SOURCE: Kaggle

::: METHODOLOGY :::

The Center for World University Rankings (CWUR) publishes the only academic ranking of global universities that assesses the quality of education, alumni employment, quality of faculty, and research performance without relying on surveys and university data submissions.

CWUR uses seven objective and robust indicators grouped into four areas to rank the world’s universities:

Education: based on the academic success of a university’s alumni, and measured by the number of a university's alumni who have won prestigious academic distinctions relative to the university's size (25%)

Employability: based on the professional success of a university’s alumni, and measured by the number of a university's alumni who have held top positions at major companies relative to the university's size (25%)

Faculty: measured by the number of faculty members who have won prestigious academic distinctions (10%)

Research:
i) Research Output: measured by the total number of research papers (10%)
ii) High-Quality Publications: measured by the number of research papers appearing in top-tier journals (10%)
iii) Influence: measured by the number of research papers appearing in highly-influential journals (10%)
iv) Citations: measured by the number of highly-cited research papers (10%)

### LINK TO ABOVE METHODOLOGY TEXT: https://cwur.org/methodology/world-university-rankings.php

## 1 Importing the relevant libraries

In [1]:
import pandas as pd
import numpy as np

## 2 Reading and understanding the data

In [4]:
wrld_uni_ranks = pd.read_csv('WORLD UNIVERSITY RANKINGS.csv')
wrld_uni_ranks.head()

Unnamed: 0,World Rank,Institution,Location,National Rank,Education Rank,Employability Rank,Faculty Rank,Research Rank,Score
0,1,Harvard University,USA,1,1,1,1,1,100.0
1,2,Massachusetts Institute of Technology,USA,2,4,12,2,7,96.7
2,3,Stanford University,USA,3,11,4,3,2,95.1
3,4,University of Cambridge,United Kingdom,1,3,25,4,10,94.1
4,5,University of Oxford,United Kingdom,2,7,27,9,4,93.3


In [8]:
# Checking the number of rows and columns
wrld_uni_ranks.shape

(2000, 9)

In [10]:
# General information on data
wrld_uni_ranks.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2000 entries, 0 to 1999
Data columns (total 9 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   World Rank          2000 non-null   int64  
 1   Institution         2000 non-null   object 
 2   Location            2000 non-null   object 
 3   National Rank       2000 non-null   int64  
 4   Education Rank      2000 non-null   object 
 5   Employability Rank  2000 non-null   object 
 6   Faculty Rank        2000 non-null   object 
 7   Research Rank       2000 non-null   object 
 8   Score               2000 non-null   float64
dtypes: float64(1), int64(2), object(6)
memory usage: 140.8+ KB


In [15]:
# Checking for duplicates
wrld_uni_ranks[wrld_uni_ranks.duplicated()]

Unnamed: 0,World Rank,Institution,Location,National Rank,Education Rank,Employability Rank,Faculty Rank,Research Rank,Score


Blank rows indicates that we do not have any duplicates

In [17]:
# Checking Null value
wrld_uni_ranks.isna().sum()

World Rank            0
Institution           0
Location              0
National Rank         0
Education Rank        0
Employability Rank    0
Faculty Rank          0
Research Rank         0
Score                 0
dtype: int64

### Inisghts
* The dataset consists of 2000 rows and 9 columns
* This implies that we are working with the top 2000 universities in the world
* The dataset does not contain duplicate values
* The dataset does not have Null values 

## 3 EXPLORATORY DATA ANALYSIS

In [19]:
list(wrld_uni_ranks.columns)

['World Rank',
 'Institution',
 'Location',
 'National Rank',
 'Education Rank',
 'Employability Rank',
 'Faculty Rank',
 'Research Rank',
 'Score']

In [36]:
# Quick overview of the countries these universities are based
wrld_uni_ranks['Location'].sort_values().unique()

array(['Algeria', 'Argentina', 'Armenia', 'Australia', 'Austria',
       'Azerbaijan', 'Bangladesh', 'Belarus', 'Belgium', 'Brazil',
       'Bulgaria', 'Cameroon', 'Canada', 'Chile', 'China', 'Colombia',
       'Costa Rica', 'Croatia', 'Cyprus', 'Czech Republic', 'Denmark',
       'Ecuador', 'Egypt', 'Estonia', 'Ethiopia', 'Finland', 'France',
       'Georgia', 'Germany', 'Ghana', 'Greece', 'Hungary', 'Iceland',
       'India', 'Indonesia', 'Iran', 'Ireland', 'Israel', 'Italy',
       'Japan', 'Jordan', 'Kazakhstan', 'Kenya', 'Kuwait', 'Latvia',
       'Lebanon', 'Lithuania', 'Luxembourg', 'Malawi', 'Malaysia',
       'Malta', 'Mexico', 'Morocco', 'Nepal', 'Netherlands',
       'New Zealand', 'Nigeria', 'North Macedonia', 'Northern Cyprus',
       'Norway', 'Oman', 'Pakistan', 'Palestine', 'Peru', 'Philippines',
       'Poland', 'Portugal', 'Qatar', 'Romania', 'Russia', 'Saudi Arabia',
       'Serbia', 'Singapore', 'Slovak Republic', 'Slovenia',
       'South Africa', 'South Korea', 'S

With the above line I was curious to see if there are any universities from South Africa that made it to the top 2000

In [46]:
wrld_uni_ranks[wrld_uni_ranks['Location'] == 'South Africa'].groupby('Location')['Institution'].count()

Location
South Africa    12
Name: Institution, dtype: int64

Seeing there are 12 universities in South Africa, I am interested to see which university is highly ranked to end
the superiority debate between Wits and Tuks students lol

In [77]:
# Quick look at the respective rankings on a global scale
wrld_uni_ranks[wrld_uni_ranks['Location'] == 'South Africa'][['World Rank', 'Institution']].sort_values(by='World Rank')

Unnamed: 0,World Rank,Institution
269,270,University of Cape Town
291,292,University of the Witwatersrand
440,441,Stellenbosch University
483,484,University of KwaZulu-Natal
554,555,University of Pretoria
628,629,University of Johannesburg
879,880,North-West University
1125,1126,University of the Free State
1185,1186,University of Western Cape
1301,1302,University of South Africa


In [171]:
# I actually thought of this when I was reviewing code for insights(last to be added)
print('Number of Universities in the top 1000 World ranks:')
print(wrld_uni_ranks[(wrld_uni_ranks['Location'] == 'South Africa') & (wrld_uni_ranks['World Rank'] < 1000)]['Institution'].count())
print('')
print('Number of Universities in the upper 1000 World ranks:')
print(wrld_uni_ranks[(wrld_uni_ranks['Location'] == 'South Africa') & (wrld_uni_ranks['World Rank'] > 1000)]['Institution'].count())

Number of Universities in the top 1000 World ranks:
7

Number of Universities in the upper 1000 World ranks:
5


In [96]:
# University rankings in South Africa
wrld_uni_ranks[wrld_uni_ranks['Location'] == 'South Africa'][['National Rank', 'Institution', 'Score']].sort_values(by='National Rank')

Unnamed: 0,National Rank,Institution,Score
269,1,University of Cape Town,77.2
291,2,University of the Witwatersrand,76.9
440,3,Stellenbosch University,74.8
483,4,University of KwaZulu-Natal,74.3
554,5,University of Pretoria,73.6
628,6,University of Johannesburg,72.9
879,7,North-West University,71.0
1125,8,University of the Free State,69.5
1185,9,University of Western Cape,69.2
1301,10,University of South Africa,68.6


I will now look into the top 3 universties by the different ranking categories

This includes education, employability, faculty and resreach

In [97]:
wrld_uni_ranks[wrld_uni_ranks['Location'] == 'South Africa']

Unnamed: 0,World Rank,Institution,Location,National Rank,Education Rank,Employability Rank,Faculty Rank,Research Rank,Score
269,270,University of Cape Town,South Africa,1,178,237,-,250,77.2
291,292,University of the Witwatersrand,South Africa,2,193,99,-,327,76.9
440,441,Stellenbosch University,South Africa,3,-,195,-,462,74.8
483,484,University of KwaZulu-Natal,South Africa,4,497,361,-,475,74.3
554,555,University of Pretoria,South Africa,5,-,739,-,526,73.6
628,629,University of Johannesburg,South Africa,6,-,1040,-,599,72.9
879,880,North-West University,South Africa,7,-,-,-,837,71.0
1125,1126,University of the Free State,South Africa,8,-,-,-,1076,69.5
1185,1186,University of Western Cape,South Africa,9,-,-,-,1133,69.2
1301,1302,University of South Africa,South Africa,10,-,950,-,1249,68.6


For the purpose of this analysis I will assume '-' as no rank and not Null

Since I am interested in the top 3, I will discard any columns with less than 3 numeric entries.

Therefore I will discard the faculty category

(I will convert all '-' to 0 temporarily due problems encountered when going through this phase) 

In [157]:
# Creating new data set to replace '-' with 0 and converting the columns from object to int 
# This is to resolve inaccuarate sorting when going through the top 3 analysis 
wrld_uni_ranks2 = wrld_uni_ranks[(wrld_uni_ranks['Location'] == 'South Africa')].replace('-', 0)
wrld_uni_ranks2[['Education Rank', 'Employability Rank', 'Research Rank']] = wrld_uni_ranks2[['Education Rank', 'Employability Rank', 'Research Rank']].astype('int64')

In [161]:
# Top 3 in the Education category
wrld_uni_ranks2[(wrld_uni_ranks2['Location'] == 'South Africa') & (wrld_uni_ranks2['Education Rank'] != 0)][['Education Rank', 'Institution']].replace('-',0).sort_values(by='Education Rank').head(3)

Unnamed: 0,Education Rank,Institution
269,178,University of Cape Town
291,193,University of the Witwatersrand
1303,374,Rhodes University


In [156]:
# Top 3 in the Employability category
wrld_uni_ranks2[(wrld_uni_ranks2['Location'] == 'South Africa') & (wrld_uni_ranks2['Employability Rank'] != 0)][['Employability Rank', 'Institution']].sort_values(by='Employability Rank').head(3)

Unnamed: 0,Employability Rank,Institution
291,99,University of the Witwatersrand
440,195,Stellenbosch University
269,237,University of Cape Town


In [160]:
# Top 3 in the Research category
wrld_uni_ranks2[(wrld_uni_ranks2['Location'] == 'South Africa') & (wrld_uni_ranks2['Research Rank'] != 0)][['Research Rank', 'Institution']].sort_values(by='Research Rank').head(3)

Unnamed: 0,Research Rank,Institution
269,250,University of Cape Town
291,327,University of the Witwatersrand
440,462,Stellenbosch University


### Insights

* There are 12 universities from South Africa in the top 2000 university world ranking as by CWUR
* 7 of these universities are in the top 1000 world rankings
* 5 in the upper 1000
* The University of Cape Town (UCT) is ranked 1st in South Africa on a global scale
* UCT is also 1st in the Education category
* The University of Witwatersrand (Wits) is 1st in the Employability category
* UCT is 1st in the Research category
* UCT and Wits are in the top 3 rankings across all categories in South Africa

## Conclusion
If you are Research focused your best options are in the following order, University of Cape Town, University of the Witwatersrand and Stellenbosch University.

If your goal is to attain academic accolades, you will have a better chance choosing the following in order,  University of Cape Town, University of the Witwatersrand  and Rhodes University.

For those that are career focused, you have a higher chance in holding a top position in a major company if you have a background from the University of the Witwatersrand, Stellenbosch University and University of Cape Town in order.