# **Achievement No. 6: Advanced Analytics and Dashboard Design (I)**

## 2026 QS World University Rankings

#### **Summary**

##### QS stands for Quacquarelli Symonds. It is a provider of services, analytics, and insight to the global higher education sector.

##### The QS World University Rankings is therefore QS's ranking of world's top universities. This ranking seems to be widely recognized and influential.

#### **Contents**

##### 1) Importing Libraries

##### 2) Importing Datasets

##### 3) Accuracy and Consistency Checks

##### 4) Completeness (Missing Values)

##### 5) Addressing Missing Values

##### - Addressing Missing Values in the 'Previous Rank', 'Size', 'Research', and 'Status' columns

##### - Addressing Missing Values in the Ranks and Scores columns

#### **1) Importing Libraries**

In [1]:
import pandas as pd
import numpy as np
import os

#### **2) Importing Datasets**

In [2]:
# Setting up the path

path = r'C:\Users\andd0\Documents\Jupyter_Advanced Analytics and Dashboard Design'

In [3]:
# Importing data sets

df_2026_ranking = pd.read_csv(os.path.join(path, '02 Data', 'Original Data', '2026 QS World University Rankings CSV.csv'), index_col = False)

In [4]:
# Note: I may need to use the 2025 ranking. I'm importing it here, too. Due to an error message, I had to use the 'lating1' encoding

file_path = os.path.join(path, '02 Data', 'Original Data', '2025 QS World University Rankings CSV.csv')

df_2025_ranking = pd.read_csv(file_path, index_col=False, encoding='latin1')

##### **Checking features of the 2026_ranking dataset**

In [5]:
# Columns and rows

df_2026_ranking.shape

(1501, 30)

In [6]:
# Data types

df_2026_ranking.dtypes

2026 Rank             object
Previous Rank         object
Institution Name      object
Country/Territory     object
Region                object
Size                  object
Focus                 object
Research              object
Status                object
AR SCORE             float64
AR RANK               object
ER SCORE             float64
ER RANK               object
FSR SCORE            float64
FSR RANK              object
CPF SCORE            float64
CPF RANK              object
IFR SCORE            float64
IFR RANK              object
ISR SCORE            float64
ISR RANK              object
ISD SCORE            float64
ISD RANK              object
IRN SCORE            float64
IRN RANK              object
EO SCORE             float64
EO RANK               object
SUS SCORE            float64
SUS RANK              object
Overall SCORE         object
dtype: object

##### Having a quick look at the 2025 ranking dataset

In [7]:
df_2025_ranking.shape

(1503, 28)

#### **3) Accuracy and Consistency Checks**

##### The data types are OK for the purposes of this analysis, except by the data type of the column *Overal SCORE*.

##### I may want to make some calculations for that column; therefore, I'll change its data type to number.

In [8]:
# This column contains hyphens; therefore, I first need to change those to NaN before I can change the column data type.

# Replace '-' with NaN
df_2026_ranking['Overall SCORE'] = df_2026_ranking['Overall SCORE'].replace('-', np.nan)

In [9]:
# Convert to float
df_2026_ranking['Overall SCORE'] = df_2026_ranking['Overall SCORE'].astype('float64')

In [10]:
# Checking whether the change was successful (it was)

df_2026_ranking.dtypes

2026 Rank             object
Previous Rank         object
Institution Name      object
Country/Territory     object
Region                object
Size                  object
Focus                 object
Research              object
Status                object
AR SCORE             float64
AR RANK               object
ER SCORE             float64
ER RANK               object
FSR SCORE            float64
FSR RANK              object
CPF SCORE            float64
CPF RANK              object
IFR SCORE            float64
IFR RANK              object
ISR SCORE            float64
ISR RANK              object
ISD SCORE            float64
ISD RANK              object
IRN SCORE            float64
IRN RANK              object
EO SCORE             float64
EO RANK               object
SUS SCORE            float64
SUS RANK              object
Overall SCORE        float64
dtype: object

In [11]:
# Descriptive statistics of quantitative variables

df_2026_ranking.describe()

Unnamed: 0,AR SCORE,ER SCORE,FSR SCORE,CPF SCORE,IFR SCORE,ISR SCORE,ISD SCORE,IRN SCORE,EO SCORE,SUS SCORE,Overall SCORE
count,1501.0,1501.0,1501.0,1501.0,1414.0,1464.0,1464.0,1499.0,1501.0,1477.0,703.0
mean,25.785943,26.944237,33.950433,30.425516,36.305658,33.32541,34.526981,53.356905,29.989674,51.254367,46.756046
std,24.500905,25.504494,28.440071,29.679882,35.252024,32.75066,31.1086,28.920632,29.197573,21.266331,18.842125
min,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,3.0,25.1
25%,8.8,8.5,10.8,6.0,6.6,5.9,8.675,27.5,6.2,35.7,30.9
50%,16.0,16.5,23.5,18.0,20.1,19.4,21.7,55.8,17.9,48.7,41.5
75%,32.7,37.5,50.5,49.7,66.1,56.7,55.65,78.5,46.0,66.5,58.9
max,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0


In [12]:
# Mode: To prevent Python from giving me a multimode answer (very likely), I specified just the first mode per column

df_2026_ranking.mode().iloc[0]

2026 Rank                           1201-1400
Previous Rank                       1001-1200
Institution Name               ADA University
Country/Territory    United States of America
Region                                   Asia
Size                                        L
Focus                                      FC
Research                                   VH
Status                                 Public
AR SCORE                                  6.4
AR RANK                                  701+
ER SCORE                                  8.2
ER RANK                                  701+
FSR SCORE                               100.0
FSR RANK                                 801+
CPF SCORE                                 1.4
CPF RANK                                 801+
IFR SCORE                               100.0
IFR RANK                                 801+
ISR SCORE                               100.0
ISR RANK                                 801+
ISD SCORE                         

#### **4) Completeness (Missing Values)**

In [13]:
# Checking how many missing values each column has

df_2026_ranking.isnull().sum()

2026 Rank              0
Previous Rank        112
Institution Name       0
Country/Territory      0
Region                 0
Size                   1
Focus                  0
Research               1
Status                47
AR SCORE               0
AR RANK                0
ER SCORE               0
ER RANK                0
FSR SCORE              0
FSR RANK               0
CPF SCORE              0
CPF RANK               0
IFR SCORE             87
IFR RANK              87
ISR SCORE             37
ISR RANK              37
ISD SCORE             37
ISD RANK              37
IRN SCORE              2
IRN RANK               2
EO SCORE               0
EO RANK                0
SUS SCORE             24
SUS RANK              24
Overall SCORE        798
dtype: int64

#### **5) Addressing Missing Values**

##### **Column** *Previous Rank*

##### **Decision:** I will impute the 112 empty values with 'Not Ranked'.

##### **Justification:** After comparing the 2026 and 2025 rankings (xls files), I discovered that these universities do not appear in the 2025 ranking. 

In [14]:
df_2026_ranking['Previous Rank'].fillna('Not Ranked', inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df_2026_ranking['Previous Rank'].fillna('Not Ranked', inplace=True)


In [15]:
# Checking result (it worked)

df_2026_ranking['Previous Rank'].tail(15)

1486    Not Ranked
1487    Not Ranked
1488    Not Ranked
1489    Not Ranked
1490     1201-1400
1491    Not Ranked
1492         1401+
1493     1201-1400
1494     1201-1400
1495    Not Ranked
1496     1201-1400
1497     1201-1400
1498    Not Ranked
1499     1201-1400
1500     1201-1400
Name: Previous Rank, dtype: object

##### **Columns** *Size & Research*

##### **Decision:** Both columns have 1 missing value each, and they are linked to the same university: Islamic Azad University. I will impute *Size* with XL and Research with 'N/A'.

##### **Justification:** I found the [support article](https://support.qs.com/hc/en-gb/articles/360021876820-QS-Institution-Classifications?utm_source=chatgpt.com) of the QS website which states that universities with more than 30,000 students is considered XL. According to the data available in the [QS website](https://www.topuniversities.com/universities/islamic-azad-university#p2-university-information), the Islamic Azad University has more than 1 million students; therefore, its size is XL. There's no info on its research intensity.

In [16]:
row = df_2026_ranking[df_2026_ranking['Institution Name'] == 'Islamic Azad University']

In [17]:
row

Unnamed: 0,2026 Rank,Previous Rank,Institution Name,Country/Territory,Region,Size,Focus,Research,Status,AR SCORE,...,ISR RANK,ISD SCORE,ISD RANK,IRN SCORE,IRN RANK,EO SCORE,EO RANK,SUS SCORE,SUS RANK,Overall SCORE
1240,1201-1400,Not Ranked,Islamic Azad University,Iran (Islamic Republic of),Asia,,CO,,,6.1,...,801+,1.1,801+,96.3,55,35.7,482,29.4,801+,


In [18]:
# Imputing 'Size'

df_2026_ranking['Size'].fillna('XL', inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df_2026_ranking['Size'].fillna('XL', inplace=True)


In [19]:
# Imputing 'Research'

df_2026_ranking['Research'].fillna('N/A', inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df_2026_ranking['Research'].fillna('N/A', inplace=True)


In [20]:
# Checking results

row_imputed = df_2026_ranking[df_2026_ranking['Institution Name'] == 'Islamic Azad University']

In [21]:
row_imputed

Unnamed: 0,2026 Rank,Previous Rank,Institution Name,Country/Territory,Region,Size,Focus,Research,Status,AR SCORE,...,ISR RANK,ISD SCORE,ISD RANK,IRN SCORE,IRN RANK,EO SCORE,EO RANK,SUS SCORE,SUS RANK,Overall SCORE
1240,1201-1400,Not Ranked,Islamic Azad University,Iran (Islamic Republic of),Asia,XL,CO,,,6.1,...,801+,1.1,801+,96.3,55,35.7,482,29.4,801+,


##### **Columns** *Status*

##### **Decision:** An online search can help find out the status of each one of the 47 universities. Sources: Wikipedia and Google AI

##### **Justification:** While Wikipedia and the AI are not sources one would resort to for more complex/sensitive info, the status of a university is something these sources can provide.

In [22]:
# Finding missing values under 'Status'

df_missing_status = df_2026_ranking[df_2026_ranking['Status'].isnull() == True]

In [23]:
df_missing_status

Unnamed: 0,2026 Rank,Previous Rank,Institution Name,Country/Territory,Region,Size,Focus,Research,Status,AR SCORE,...,ISR RANK,ISD SCORE,ISD RANK,IRN SCORE,IRN RANK,EO SCORE,EO RANK,SUS SCORE,SUS RANK,Overall SCORE
81,82,Not Ranked,Adelaide University,Australia,Oceania,XL,CO,VH,,78.3,...,96,72.5,251,96.7,46,70.4,207,86.1,98=,72.6
261,262,236,University of Aberdeen,United Kingdom,Europe,M,FC,VH,,40.1,...,185,88.3,155,89.7,176,30.0,549,87.8,83=,49.3
309,310,Not Ranked,"City St George’s, University of London",United Kingdom,Europe,L,FO,VH,,26.9,...,90,98.0,77,77.2,402,65.8,238,62.3,452=,44.5
382,381,355,University of Luxembourg,Luxembourg,Europe,M,FO,VH,,13.2,...,33,100.0,21,68.1,566,58.8,277,42.4,801+,39.4
487,487,380,Technische Universität Bergakademie Freiberg,Germany,Europe,S,FO,VH,,7.4,...,25,100.0,16,35.1,801+,1.4,801+,32.9,801+,32.8
519,519,440,Singapore University of Technology and Design,Singapore,Asia,S,SP,VH,,11.0,...,,,,35.2,801+,5.3,801+,25.0,801+,31.3
529,530,Not Ranked,The Education University of Hong Kong,"Hong Kong SAR, China",Asia,M,FO,VH,,9.4,...,285,61.7,325,43.4,801+,9.4,801+,35.3,801+,30.8
572,571,489,Isfahan University of Technology,Iran (Islamic Republic of),Asia,M,FO,VH,,7.8,...,801+,7.7,801+,45.7,801+,9.6,801+,28.0,801+,29.2
582,582,547,University of Iceland,Iceland,Europe,M,CO,VH,,9.6,...,761,24.2,700,75.5,432,89.3,110,49.4,719=,28.8
784,781-790,Not Ranked,Hong Kong Metropolitan University,"Hong Kong SAR, China",Asia,L,FO,HI,,26.5,...,147,81.9,189,11.2,801+,29.3,563,36.6,801+,


##### I will create a dictionary composed by the universities with empty values under 'Status' and the online findings:

In [24]:
status_dict = {
    'Adelaide University': 'N/A',
	'University of Aberdeen': 'Public',
	'City St George’s, University of London': 'Public',
	'University of Luxembourg': 'Public',
	'Technische Universität Bergakademie Freiberg': 'Public',
	'Singapore University of Technology and Design': 'Public',
	'The Education University of Hong Kong': 'Public',
	'Isfahan University of Technology': 'Public',
	'University of Iceland': 'Public',
	'Hong Kong Metropolitan University': 'Public',
	'Norwegian University of Life Sciences (UMB)': 'Public',
	'University of Namur': 'Public',
	'Zurich University of Applied Sciences (ZHAW)': 'Public',
	'Addis Ababa University': 'Public',
	'Azerbaijan Technical University': 'Public',
	'University of Stavanger': 'Public',
	'Macao Polytechnic University': 'N/A',
	'Osaka Metropolitan University': 'Public',
	'Universidad de Valladolid': 'Public',
	'Universidad Europea de Madrid': 'Private for Profit',
	'Universidad de Córdoba - España': 'Public',
	'University of Cyberjaya': 'Private for Profit',
	'CEU University': 'Private not for Profit',
	'Universidad de León': 'Public',
	'Universitat de Lleida': 'Public',
	'Université de Bretagne Occidentale (UBO)': 'Public',
	'Université Sorbonne Paris Nord': 'Public',
	'University of Deusto': 'Private not for Profit',
	'University of Ibadan': 'Public',
	'University of Ioannina': 'Public',	
	'University of Lagos': 'Public',
	'University of the Algarve': 'Public',
	'Ahmadu Bello University, Zaria': 'Public',
	'Islamic Azad University': 'Private for Profit',
	'Jahangirnagar University': 'Public',
	'Kwame Nkrumah University of Science and Technology': 'Public',
	'New Mexico State University': 'Public',
	'Technische Universität Kaiserslautern': 'Public',
	'UCAM Universidad Católica San Antonio de Murcia':'Private not for Profit',
	'Universidad Nacional de Ingeniería Peru': 'Public',
	'Université de Limoges': 'Public',
	'University of Rajshahi': 'Public',
	'Khulna University': 'Public',
	'Rajshahi University of Engineering and Technology': 'Public',
	'San Francisco State University': 'Public',
	'Tongmyong University': 'Public',
	'University of Hawaii at Hilo': 'Public',
}

##### This mapping function will assign the findings to the corresponding universities

In [25]:
df_2026_ranking['Status'] = df_2026_ranking['Status'].fillna(df_2026_ranking['Institution Name'].map(status_dict))

In [26]:
# Let's check some universities to see if the change was successful

row_status = df_2026_ranking[df_2026_ranking['Institution Name'].isin([
    'Adelaide University',
    'University of Aberdeen',
    'City St George’s, University of London',
    'University of Luxembourg'
])]

In [27]:
row_status

Unnamed: 0,2026 Rank,Previous Rank,Institution Name,Country/Territory,Region,Size,Focus,Research,Status,AR SCORE,...,ISR RANK,ISD SCORE,ISD RANK,IRN SCORE,IRN RANK,EO SCORE,EO RANK,SUS SCORE,SUS RANK,Overall SCORE
81,82,Not Ranked,Adelaide University,Australia,Oceania,XL,CO,VH,,78.3,...,96,72.5,251,96.7,46,70.4,207,86.1,98=,72.6
261,262,236,University of Aberdeen,United Kingdom,Europe,M,FC,VH,Public,40.1,...,185,88.3,155,89.7,176,30.0,549,87.8,83=,49.3
309,310,Not Ranked,"City St George’s, University of London",United Kingdom,Europe,L,FO,VH,Public,26.9,...,90,98.0,77,77.2,402,65.8,238,62.3,452=,44.5
382,381,355,University of Luxembourg,Luxembourg,Europe,M,FO,VH,Public,13.2,...,33,100.0,21,68.1,566,58.8,277,42.4,801+,39.4


#### **Missing values in Ranks and Scores**

##### The IFR, ISR, IRN, and SUS ranks/scores are present in both rankings: 2025 and 2026. However, they don't use the same measuring methods, as the [2025](https://www.kaggle.com/datasets/melissamonfared/qs-world-university-rankings-2025?utm_source=chatgpt.com) one uses scores and the [other](https://www.kaggle.com/datasets/akashbommidi/2026-qs-world-university-rankings) ratios/percentages.

##### Regarding the ISD ranking/score (diversity among international students), it looks like is present in the 2026 ranking only - at least, it is not clear what its equivalent is in the 2025 ranking.

##### For these reasons, all empty values under these ranks columns will be labelled as 'Not Ranked' and the score columns will be replaced with NaN; thus, the columns' datatype (float) will not change to string.

#### **IFR**

In [28]:
df_ifr_rank_missing = df_2026_ranking [df_2026_ranking['IFR RANK'].isnull() == True]

In [29]:
df_ifr_rank_missing.shape

(87, 30)

In [30]:
df_ifr_score_missing = df_2026_ranking [df_2026_ranking['IFR SCORE'].isnull() == True]

In [31]:
df_ifr_score_missing.shape

(87, 30)

##### It matches the number of missing values under *4) Completeness* = 87. I proceed with the imputing.

In [32]:
df_2026_ranking['IFR RANK'].fillna('Not Ranked', inplace = True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df_2026_ranking['IFR RANK'].fillna('Not Ranked', inplace = True)


In [33]:
df_2026_ranking['IFR SCORE'].fillna(np.nan, inplace = True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df_2026_ranking['IFR SCORE'].fillna(np.nan, inplace = True)


In [34]:
# Checking results

df_2026_ranking[df_2026_ranking['IFR RANK'] == 'Not Ranked'][['Institution Name', 'IFR RANK', 'IFR SCORE']]

Unnamed: 0,Institution Name,IFR RANK,IFR SCORE
215,Indian Institute of Technology Kharagpur (IITKGP),Not Ranked,
355,National Technical University of Athens,Not Ranked,
361,University of Chinese Academy of Sciences (UCAS),Not Ranked,
464,Anna University,Not Ranked,
519,Singapore University of Technology and Design,Not Ranked,
...,...,...,...
1479,Universidad Nacional del Sur,Not Ranked,
1482,Universidade Federal do Rio Grande Do Norte,Not Ranked,
1491,University of Moratuwa,Not Ranked,
1493,University of Sri Jayewardenepura,Not Ranked,


#### **ISR**

In [35]:
df_isr_rank_missing = df_2026_ranking [df_2026_ranking['ISR RANK'].isnull() == True]

In [36]:
df_isr_rank_missing.shape

(37, 30)

In [37]:
df_isr_score_missing = df_2026_ranking [df_2026_ranking['ISR SCORE'].isnull() == True]

In [38]:
df_isr_score_missing.shape

(37, 30)

##### It matches the number of missing values under *4) Completeness* = 37. I proceed with the imputing.

In [39]:
df_2026_ranking['ISR RANK'].fillna('Not Ranked', inplace = True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df_2026_ranking['ISR RANK'].fillna('Not Ranked', inplace = True)


In [40]:
df_2026_ranking['ISR SCORE'].fillna(np.nan, inplace = True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df_2026_ranking['ISR SCORE'].fillna(np.nan, inplace = True)


In [41]:
# Checking results

df_2026_ranking[df_2026_ranking['ISR RANK'] == 'Not Ranked'][['Institution Name', 'ISR RANK', 'ISR SCORE']]

Unnamed: 0,Institution Name,ISR RANK,ISR SCORE
519,Singapore University of Technology and Design,Not Ranked,
653,China University of Mining and Technology,Not Ranked,
706,Nanjing University of Science and Technology,Not Ranked,
724,Pakistan Institute of Engineering and Applied ...,Not Ranked,
762,Bangladesh University of Engineering and Techn...,Not Ranked,
788,Universidad de Antioquia,Not Ranked,
799,Norwegian University of Life Sciences (UMB),Not Ranked,
852,Addis Ababa University,Not Ranked,
871,Jiangnan University,Not Ranked,
888,Université de Franche-Comté,Not Ranked,


#### **IRN**

In [42]:
df_irn_rank_missing = df_2026_ranking [df_2026_ranking['IRN RANK'].isnull() == True]

In [43]:
df_irn_rank_missing.shape

(2, 30)

In [44]:
df_irn_score_missing = df_2026_ranking [df_2026_ranking['IRN SCORE'].isnull() == True]

In [45]:
df_irn_score_missing.shape

(2, 30)

##### It matches the number of missing values under *4) Completeness* = 2. I proceed with the imputing.

In [46]:
df_2026_ranking['IRN RANK'].fillna('Not Ranked', inplace = True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df_2026_ranking['IRN RANK'].fillna('Not Ranked', inplace = True)


In [47]:
df_2026_ranking['IRN SCORE'].fillna(np.nan, inplace = True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df_2026_ranking['IRN SCORE'].fillna(np.nan, inplace = True)


In [48]:
# Checking results

df_2026_ranking[df_2026_ranking['IRN RANK'] == 'Not Ranked'][['Institution Name', 'IRN RANK', 'IRN SCORE']]

Unnamed: 0,Institution Name,IRN RANK,IRN SCORE
929,Universidad Católica Andrés Bello - UCAB,Not Ranked,
984,Universidad de Belgrano,Not Ranked,


#### **ISD**

In [49]:
df_isd_rank_missing = df_2026_ranking [df_2026_ranking['ISD RANK'].isnull() == True]

In [50]:
df_isd_rank_missing.shape

(37, 30)

In [51]:
df_isd_score_missing = df_2026_ranking [df_2026_ranking['ISD SCORE'].isnull() == True]

In [52]:
df_isd_score_missing.shape

(37, 30)

##### It matches the number of missing values under *4) Completeness* = 37. I proceed with the imputing.

In [53]:
df_2026_ranking['ISD RANK'].fillna('Not Ranked', inplace = True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df_2026_ranking['ISD RANK'].fillna('Not Ranked', inplace = True)


In [54]:
df_2026_ranking['ISD SCORE'].fillna(np.nan, inplace = True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df_2026_ranking['ISD SCORE'].fillna(np.nan, inplace = True)


In [55]:
# Checking results

df_2026_ranking[df_2026_ranking['ISD RANK'] == 'Not Ranked'][['Institution Name', 'ISD RANK', 'ISD SCORE']]

Unnamed: 0,Institution Name,ISD RANK,ISD SCORE
519,Singapore University of Technology and Design,Not Ranked,
653,China University of Mining and Technology,Not Ranked,
706,Nanjing University of Science and Technology,Not Ranked,
724,Pakistan Institute of Engineering and Applied ...,Not Ranked,
762,Bangladesh University of Engineering and Techn...,Not Ranked,
788,Universidad de Antioquia,Not Ranked,
799,Norwegian University of Life Sciences (UMB),Not Ranked,
852,Addis Ababa University,Not Ranked,
871,Jiangnan University,Not Ranked,
888,Université de Franche-Comté,Not Ranked,


#### **SUS**

In [56]:
df_sus_rank_missing = df_2026_ranking [df_2026_ranking['SUS RANK'].isnull() == True]

In [57]:
df_sus_rank_missing.shape

(24, 30)

In [58]:
df_sus_score_missing = df_2026_ranking [df_2026_ranking['SUS SCORE'].isnull() == True]

In [59]:
df_sus_score_missing.shape

(24, 30)

##### It matches the number of missing values under *4) Completeness* = 24. I proceed with the imputing.

In [60]:
df_2026_ranking['SUS RANK'].fillna('Not Ranked', inplace = True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df_2026_ranking['SUS RANK'].fillna('Not Ranked', inplace = True)


In [61]:
df_2026_ranking['SUS SCORE'].fillna(np.nan, inplace = True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df_2026_ranking['SUS SCORE'].fillna(np.nan, inplace = True)


In [62]:
# Checking results

df_2026_ranking[df_2026_ranking['SUS RANK'] == 'Not Ranked'][['Institution Name', 'SUS RANK', 'SUS SCORE']]

Unnamed: 0,Institution Name,SUS RANK,SUS SCORE
361,University of Chinese Academy of Sciences (UCAS),Not Ranked,
446,Belarusian State University,Not Ranked,
778,Universidad de Palermo,Not Ranked,
859,Belarusian National Technical University,Not Ranked,
929,Universidad Católica Andrés Bello - UCAB,Not Ranked,
984,Universidad de Belgrano,Not Ranked,
995,University of Cyberjaya,Not Ranked,
1010,Asfendiyarov Kazakh National Medical University,Not Ranked,
1033,German University of Technology in Oman (GUTECH),Not Ranked,
1040,Islamic University of Lebanon,Not Ranked,


#### **Accuracy and consistency checks after 1st cleaning**

In [63]:
# Rows and columns

df_2026_ranking.shape

(1501, 30)

In [64]:
# Missing values

df_2026_ranking.isnull().sum()

2026 Rank              0
Previous Rank          0
Institution Name       0
Country/Territory      0
Region                 0
Size                   0
Focus                  0
Research               0
Status                 0
AR SCORE               0
AR RANK                0
ER SCORE               0
ER RANK                0
FSR SCORE              0
FSR RANK               0
CPF SCORE              0
CPF RANK               0
IFR SCORE             87
IFR RANK               0
ISR SCORE             37
ISR RANK               0
ISD SCORE             37
ISD RANK               0
IRN SCORE              2
IRN RANK               0
EO SCORE               0
EO RANK                0
SUS SCORE             24
SUS RANK               0
Overall SCORE        798
dtype: int64

In [65]:
# Data types

df_2026_ranking.dtypes

2026 Rank             object
Previous Rank         object
Institution Name      object
Country/Territory     object
Region                object
Size                  object
Focus                 object
Research              object
Status                object
AR SCORE             float64
AR RANK               object
ER SCORE             float64
ER RANK               object
FSR SCORE            float64
FSR RANK              object
CPF SCORE            float64
CPF RANK              object
IFR SCORE            float64
IFR RANK              object
ISR SCORE            float64
ISR RANK              object
ISD SCORE            float64
ISD RANK              object
IRN SCORE            float64
IRN RANK              object
EO SCORE             float64
EO RANK               object
SUS SCORE            float64
SUS RANK              object
Overall SCORE        float64
dtype: object

##### **Notes:** 

##### - The numbers different from 0 in front of the columns (first batch below 'shape') correspond to NaN and not empty cells.

##### - The data types remain as needed.

In [66]:
# Let's check descriptive stats after the cleaning:

df_2026_ranking.describe()

Unnamed: 0,AR SCORE,ER SCORE,FSR SCORE,CPF SCORE,IFR SCORE,ISR SCORE,ISD SCORE,IRN SCORE,EO SCORE,SUS SCORE,Overall SCORE
count,1501.0,1501.0,1501.0,1501.0,1414.0,1464.0,1464.0,1499.0,1501.0,1477.0,703.0
mean,25.785943,26.944237,33.950433,30.425516,36.305658,33.32541,34.526981,53.356905,29.989674,51.254367,46.756046
std,24.500905,25.504494,28.440071,29.679882,35.252024,32.75066,31.1086,28.920632,29.197573,21.266331,18.842125
min,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,3.0,25.1
25%,8.8,8.5,10.8,6.0,6.6,5.9,8.675,27.5,6.2,35.7,30.9
50%,16.0,16.5,23.5,18.0,20.1,19.4,21.7,55.8,17.9,48.7,41.5
75%,32.7,37.5,50.5,49.7,66.1,56.7,55.65,78.5,46.0,66.5,58.9
max,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0


In [67]:
# Mode

df_2026_ranking.mode().iloc[0]

2026 Rank                           1201-1400
Previous Rank                       1001-1200
Institution Name               ADA University
Country/Territory    United States of America
Region                                   Asia
Size                                        L
Focus                                      FC
Research                                   VH
Status                                 Public
AR SCORE                                  6.4
AR RANK                                  701+
ER SCORE                                  8.2
ER RANK                                  701+
FSR SCORE                               100.0
FSR RANK                                 801+
CPF SCORE                                 1.4
CPF RANK                                 801+
IFR SCORE                               100.0
IFR RANK                                 801+
ISR SCORE                               100.0
ISR RANK                                 801+
ISD SCORE                         

#### **Exporting**

In [68]:
df_2026_ranking.to_csv(os.path.join(path, '02 Data', 'Prepared Data', 'Advanced Analytics and Dashboard Design (I).csv'))