#4. Visualizing the most relevant MSc degrees
Using maps can help people understand how far one university is from another so they can plan their academic careers more adequately. Here, we challenge you to show a map of the courses found with the score defined in point 3. You should be able to identify at least the city and country for each MSc degree. You can find some ideas on how to create maps in Python here and here but you will maybe need further information for a proper visualization, like coordinates (latitude and longitude). You can retrieve this data using various tools:

1.   Here you can find a helpful tutorial on how to encode geo-informations using Google API in Python (this tool can also be used in Google Sheets)
2.   You can collect a list of unique places in the format (City, Country) and ask chatGPT (or, as usual, any other LLM chatbot) to provide you with a list of corresponding representative coordinates
3.   Explore and find the best solution for your case!

Once you defined your visualization strategy, include a way to encode fees in your charts. The map should show (with a proper legend) different courses and associated taxation: the user wants a glimpse not only of how far he will need to move but also of how much it will cost him!

In [None]:
#Libraries

import pandas as pd
import os
import nltk
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
import string
import pandas as pd
import os
import re
import requests
import numpy as np
import time
import folium
from folium import IFrame
from geopy.distance import geodesic
from geopy.geocoders import Nominatim
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from IPython.display import display

nltk.download('stopwords')
nltk.download('punkt')

In [None]:
# I used Google Drive for storing the data, You could find them in RQ 3 folder on github
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
# Get the database from RQ3
db_from_RQ3 = pd.read_csv('/content/drive/MyDrive/ADM HW3/DB_from_RQ3.csv')

# Get the query result from RQ2
query_result = pd.read_csv('/content/drive/MyDrive/ADM HW3/Query_Result.csv')


In [None]:
# Get list of all the cities
cities = set(db_from_RQ3['city'].values)
# Get dictionary of cities and their coordinates and countries
city_country_coordinates = get_city_country_with_coordinates(cities)

In [None]:
import time
from geopy.geocoders import Nominatim

def get_city_country_with_coordinates(cities):
    # Dictionary to store cities and their coordinates along with countries
    city_country_coordinates = {}

    # Loop through each city in the provided list
    for city in cities:
        geolocator = Nominatim(user_agent="my_geocoder")

        # Get location coordinates for the current city
        location = geolocator.geocode(city)

        if location:
            # Extract latitude and longitude coordinates
            city_coordinates = (location.latitude, location.longitude)

            # Extract the country from the address components
            address_components = location.address.split(",")
            country = address_components[-1].strip()

            # Store city details in the dictionary
            city_country_coordinates[city] = (city_coordinates, country)
        else:
            # If the city is not found, set the coordinates to None
            city_country_coordinates[city] = (None, "")

        # Sleep for 1 second to avoid overloading the geolocator service
        time.sleep(1)

    return city_country_coordinates


In [None]:
# Create a DataFrame
df_city_country_coordinates = pd.DataFrame(list(city_country_coordinates.items()), columns=['City', 'Data'])

# Split the Data column into 'Coordinates' and 'Country' an then drop the 'Data' column
df_city_country_coordinates[['Coordinates', 'Country']] = pd.DataFrame(df_city_country_coordinates['Data'].tolist(), index=df_city_country_coordinates.index)
df_city_country_coordinates = df_city_country_coordinates.drop('Data', axis=1)

df_city_country_coordinates.head()

Unnamed: 0,City,Coordinates,Country
0,Winnipeg,"(49.8955367, -97.1384584)",Canada
1,Brussels,"(50.8550018, 4.3512333761166175)",België / Belgique / Belgien
2,Northampton,"(52.2378853, -0.8963639)",United Kingdom
3,Sunderland,"(54.9058512, -1.3828727)",United Kingdom
4,Luneburg,"(53.248706, 10.407855)",Deutschland


In [None]:
def create_map(city_country_coordinates, query_results):
    # Create a map centered around the first city in the coordinates dataframe
    map_center = city_country_coordinates['Coordinates'].iloc[0]
    my_map = folium.Map(location=map_center, zoom_start=2, tiles='Stamen Terrain')

    # Add a world map as the base layer
    folium.TileLayer('openstreetmap').add_to(my_map)

    # Keep track of added universities to prevent duplicates
    added_universities = set()

    # Add markers for each city in the query result dataframe
    for city_info in query_results['city'].unique():
        # Get the coordinates and country for the current city
        city_data = city_country_coordinates.loc[city_country_coordinates['City'] == city_info]
        city_coordinates = city_data['Coordinates'].values[0]
        city_country = city_data['Country'].values[0]

        # Get master courses information for the current city
        city_courses = query_results[query_results['city'] == city_info]

        # Create a label for the marker with master courses information
        label = f'<h3 style="text-align:center; color: blue;">{city_info}, {city_country}</h3><br>'

        # Create a set to store added universities for the current city
        added_universities_city = set()

        for _, course in city_courses.iterrows():
            university = course["universityName"]

            # Check if the university has already been added for this city
            if university not in added_universities and university not in added_universities_city:
                label += f'<p><b>University:</b> {university}</p>'
                label += f'<p><b>Course:</b> {course["courseName"]}</p>'

                fees_text = f'<p><b>Tuition in USD:</b> ${course["fees_USD"]:.2f}</p>' if course["fees_USD"] != 0.00 else \
                            f'<p><b>Tuition in USD:</b> Visit the website</p>'
                label += f'{fees_text}'

                url_text = f'<p><b>URL:</b> <a href="{course["url"]}" style="color: green;" target="_blank">{course["url"]}</a></p>'
                label += f'{url_text}<br><br>'

                # Mark the university as added for this city
                added_universities.add(university)
                added_universities_city.add(university)

        # Create and add the marker to the map
        iframe = IFrame(html=label, width=300, height=200)
        popup = folium.Popup(iframe, max_width=300)
        folium.Marker(location=city_coordinates, popup=popup, icon=folium.Icon(color='red', icon='info-sign')).add_to(my_map)

    # Display the map in the notebook
    display(my_map)


In [None]:
# Example usage
create_map(df_city_country_coordinates, query_result)

# Map guide
This tool offers students an improved way to visually explore the geographical distribution of the master's degree programs they're seeking.

The map showcases location markers placed on cities where these master's degree programs are offered, aiding students in geographical understanding.

Upon clicking any location marker, a detailed pop-up window appears, providing comprehensive insights into the specific master's degree program. This includes details such as the course name, university name, tuition fees in USD, and a clickable URL directing to the official program website.