## 4. Visualizing the most relevant MSc degrees

Using maps can help people understand how far one university is from another so they can plan their academic careers more adequately. Here, we challenge you to show a map of the courses found with the score defined in point 3. You should be able to identify at least the *city* and *country* for each MSc degree. You can find some ideas on how to create maps in Python [here](https://plotly.com/python/maps/) and [here](https://towardsdatascience.com/visualizing-geospatial-data-in-python-e070374fe621) but you will maybe need further information for a proper visualization, like coordinates (latitude and longitude). You can retrieve this data using various tools:

1. [Here](https://medium.com/@manilwagle/geocoding-the-world-using-google-api-and-python-1f6b6fb6ca48) you can find a helpful tutorial on how to encode geo-informations using Google API in Python (this tool can also be used in [Google Sheets](https://handsondataviz.org/geocode.html))
2. You can collect a list of unique places in the format (City, Country) and ask chatGPT (or, as usual, any other LLM chatbot) to provide you with a list of corresponding representative coordinates
3. Explore and find the best solution for your case!
   
Once you defined your visualization strategy, include a way to encode fees in your charts. The map should show (with a proper legend) different courses and associated taxation: the user wants a glimpse not only of how far he will need to move but also of how much it will cost him!


In [1]:
import pandas as pd
import re
from geopy.geocoders import Nominatim
import ssl
import certifi
import geopy.geocoders
from geopy.exc import GeocoderTimedOut

import plotly.express as px
import warnings
warnings.filterwarnings('ignore')

ctx = ssl.create_default_context(cafile=certifi.where())
geopy.geocoders.options.default_ssl_context = ctx

In [2]:
# Read the TSV data
df = pd.read_csv("TSV/course_1.tsv", sep="\t", index_col=False)

for i in range(2, 6001):
    try:
        df1 = pd.read_csv(
            "TSV/course_" + str(i) + ".tsv",
            sep="\t",
            index_col=False,
        )
        df1.index += i - 1
        df = pd.concat([df, df1])
    except Exception as e:
        print(i)
        print("Error: ", e)

print(df.shape)
df.head()

(6000, 13)


Unnamed: 0,courseName,universityName,facultyName,isItFullTime,description,startDate,fees,modality,duration,city,country,administration,url
0,Computer Science - MSc,University of Hertfordshire,"School of Physics, Engineering and Computer Sc...",Full time,Why choose Herts?Industry Accreditation: Accre...,See Course,UK Students Full time: £9450 for the 2022/202...,MSc,"1 year full-time, 15 months full-time, 3 years...",Hatfield,United Kingdom,On Campus,https://www.findamasters.com/masters-degrees/c...
1,Computer Science (Cyber Security) - MSc,Staffordshire University,"School of Digital, Technologies and Arts",Full time,Join the fight against malicious programs and ...,September,Find the specific fees for your chosen program...,MSc,13 months - 25 months,Stoke on Trent,United Kingdom,On Campus,https://www.findamasters.com/masters-degrees/c...
2,Computer Science (Data Science) - MSc,Trinity College Dublin,School of Computer Science & Statistics,Full time,The MSc in Computer Science is an exciting one...,September,Please see the university website for further ...,MSc,1 year full-time,Dublin,Ireland,On Campus,https://www.findamasters.com/masters-degrees/c...
3,Computer Science (by Research) - MSc,Lancaster University,School of Computing and Communications,Full time,The MSc by Research programme can be tailored ...,See Course,Please see the university website for further ...,MSc,"12 months full-time, 24 months part time",Lancaster,United Kingdom,On Campus,https://www.findamasters.com/masters-degrees/c...
4,Computer Science (Computer Networks and Securi...,Staffordshire University,"School of Digital, Technologies and Arts",Full time,Secure your future career with our Computer Sc...,September,Find the specific fees for your chosen program...,MSc,13 months - 25 months,Stoke on Trent,United Kingdom,On Campus,https://www.findamasters.com/masters-degrees/c...


In [3]:
#a function that will take in a string fee and return just the numeric part of it as a float
def convert_to_numeric(value):
    # Removing currency symbols, commas, and spaces
    value = re.sub(r"eur|sek|chf|gbp|rmb|jpy|qr|[£€]|,|\s", "", value)
    return float(value)


def find_fees(text):
    if isinstance(text, str):
        # Removing patterns that contain years from the text ex. 2022/2023 so that our regex doesn't recognize it as a part of the fee
        text = re.sub(r"\b\d{4}/\d{4}\b|\b\d{4}/\d{2}\b", "", text)

        # Regular expression pattern for currency values
        currency_pattern = r"((eur|sek|chf|gbp|rmb|jpy|qr|[£€])\s?\d+(?:[.,\s]\d{3})*(?:[.,]\d{2})?|\d+(?:[.,\s]\d{3})*(?:[.,]\d{2})?\s?(eur|sek|chf|gbp|rmb|jpy|qr|[£€]))"
        matches = re.findall(currency_pattern, text)

        # Exchange rates
        exchange_rates = {
            "SEK": 0.08588,
            "GBP": 1.1443,
            "CHF": 1.03708,
            "JPY": 0.00618,
            "QR": 0.25672,
            "RMB": 0.12892,
        }

        # Converting to euros and calculating values
        numeric_values = []
        for value in matches:
            value_numeric = convert_to_numeric(value[0])
            currency = value[1].upper() 
            numeric_values.append(value_numeric * exchange_rates.get(currency, 1)) #converting all the fees to euros

        # Returning the maximum value or None if no values
        return max(numeric_values) if numeric_values else None
    else:
        return text

In [4]:
# Applying the function to the dataframe
df["fees"] = df["fees"].apply(find_fees)
df.rename(columns={"fees": "fees (euro)"}, inplace=True)
df[df["fees (euro)"].notna()].head()

Unnamed: 0,courseName,universityName,facultyName,isItFullTime,description,startDate,fees (euro),modality,duration,city,country,administration,url
0,Computer Science - MSc,University of Hertfordshire,"School of Physics, Engineering and Computer Sc...",Full time,Why choose Herts?Industry Accreditation: Accre...,See Course,16500.0,MSc,"1 year full-time, 15 months full-time, 3 years...",Hatfield,United Kingdom,On Campus,https://www.findamasters.com/masters-degrees/c...
29,Clinical Cognitive Neuroscience (MSc),Sheffield Hallam University,Postgraduate Courses,Full time,Develop a broad range of practical skills esse...,September,10310.0,MSc,"1 year full-time, 2 years part-time",Sheffield,United Kingdom,On Campus,https://www.findamasters.com/masters-degrees/c...
49,Fashion Forecasting & Data Analysis - MA/MSc,University for the Creative Arts,Business School for the Creative Industries,Full time,UCA's new MSc degree in Fashion Forecasting an...,September,10500.0,"MA, MSc",1 year full time,Farnham,United Kingdom,On Campus,https://www.findamasters.com/masters-degrees/c...
50,Facade Engineering - MSc,"University of the West of England, Bristol",Department of Architecture and the Built Envir...,Full time,Façade engineering is a discipline in its own ...,September,11500.0,MSc,"1 year full time, 2 years part time",Bristol,United Kingdom,On Campus,https://www.findamasters.com/masters-degrees/c...
51,Fashion Tech (Specializing Master),"POLI.design, Società consortile a responsabili...",Postgraduate Courses,Full time,The Fashion Tech designer has a decisive role ...,April,11000.0,"MSc, MA",13 months,Milan,Italy,On Campus,https://www.findamasters.com/masters-degrees/c...


In [5]:
from geopy.geocoders import Nominatim
from geopy.exc import GeocoderTimedOut

geolocator = Nominatim(user_agent="trial", scheme='http', timeout=5)  # Set timeout value

# Create a cache dictionary to store fetched locations for universities
location_cache = {}

def get_location(row):
    if row['universityName'] in location_cache:
        return location_cache[row['universityName']]
    else:
        try:
            university_location = geolocator.geocode(row['universityName'])
            if university_location:
                location_cache[row['universityName']] = (university_location.latitude, university_location.longitude)
                return university_location.latitude, university_location.longitude
        except GeocoderTimedOut:
            pass
        
        try:
            # Concatenate City and Country
            concat_location = f"{row['city']}, {row['country']}"
            country_location = geolocator.geocode(concat_location)
            if country_location:
                return country_location.latitude, country_location.longitude
        except GeocoderTimedOut:
            pass

        return None, None

# Apply the function to the dataframe to get latitude and longitude
df[['latitude', 'longitude']] = df.apply(lambda row: pd.Series(get_location(row)), axis=1)


In [28]:
df_tmp = df[df["city"].isna()==False]

# Grouping by university and aggregating courses into a list
df_tmp['courses_list'] = df_tmp.groupby(['universityName'])['courseName'].transform(lambda x: '<br>'.join(x))

df_tmp['fees (euro)'] = df_tmp['fees (euro)'].astype(str)
df_tmp['fees_list'] = df_tmp.groupby(['universityName'])['fees (euro)'].transform(lambda x: ', '.join(x))


# Taking unique values for plotting
df_plot = df_tmp[['universityName', 'latitude', 'longitude', 'city', 'fees_list', 'courses_list']].drop_duplicates()

# Create map
fig = px.scatter_mapbox(df_plot, lat="latitude", lon="longitude", hover_name="universityName",
                        hover_data=["fees_list", "city", "courses_list"], zoom=3, color="city")

# Update layout for the map
fig.update_layout(
    mapbox=dict(
        style="carto-positron",
        zoom=3,
        center=dict(lat=df_plot['latitude'].mean(), lon=df_plot['longitude'].mean())
    ),
    showlegend=True
)

# Show the map
fig.show()

In [30]:
fig.write_html("map_of_masters.html")