# <span style="color:blue"> WOMEN IN TECH - COLLEGES - MAP REPRESENTATION</span><br>

## <span style="color:blue"> HND IN WEB DEVELOPMENT</span><br>

##### __By__ Ester Giménez

### <span style="color:blue"> Introduction</span><br>

The database was developed manually, looking for addresses of the campuses in the websites of the colleges and in Google Maps, when those were not in the college's websites. Usually, a college covers a big area and has several campuses distributed within an administrative boundary. For example, Forth Valley College includes campuses in:</br>

- Stirling</br>
- Falkirk</br>
- Clackmannanshire</br>

In total I have the addresses of 58 campuses, without including the colleges of Art and Music (conservatories, etc) and the colleges that were considered as university campuses of the University of the Highlands and Islands (13 colleges in total).</br>

### <span style="color:blue"> Map of the campuses of all colleges in Scotland:</span><br>

First, I will import the libraries and read the datasets.

In [1]:
import pandas as pd
import janitor
import geopandas as gpd
import numpy as np
import folium
from folium.plugins import MarkerCluster
import requests

address_list = pd.read_excel("https://github.com/EsterGM/Women-In-Tech/tree/main/Colleges/Coll-Addresses.xlsx")

In [2]:
address_list.head()

Unnamed: 0,Region,College,Campus,Address,Address2,Location,Postcode,Total Address,Telf,email,web / Notes
0,Aberdeen and Aberdeenshire,North East Scotland College,Aberdeen Altens Campus,Hareness Road,Altens Industrial Estate,Aberdeen,AB12 3LE,"Aberdeen Altens Campus, Hareness Road, Altens ...",01224 612704,,www.nescol.ac.uk
1,Aberdeen and Aberdeenshire,North East Scotland College,Aberdeen City Campus,Gallowgate,,Aberdeen,AB25 1BN,"Aberdeen City Campus, Gallowgate, Aberdeen AB...",01224 612330,studentadvice@nescol.ac.uk,
2,Aberdeen and Aberdeenshire,North East Scotland College,Ellon Learning Centre,Ellon Academy Community Campus,"Kellie Pearl Way, Cromleybank",Ellon,AB41 8LF,"Ellon Learning Centre, Ellon Academy Community...",,,
3,Aberdeen and Aberdeenshire,North East Scotland College,Fraserburgh Campus,Henderson Road,,Fraserburgh,AB43 9GA,"Fraserburgh Campus, Henderson Road, Fraserburg...",01346 586129,,
4,Aberdeen and Aberdeenshire,North East Scotland College,Inverurie Learning Centre,Crichie Cottage,,Inverurie,AB51 3SW,"Inverurie Learning Centre, Crichie Cottage, In...",01467 623651,,


The Folium library needs geographical coordinates to represent the addresses of the colleges in the dataset. OpenStreetMap is a non-profit organisation that uses a geo-code packing called "Nominatim". I will use it to obtain the coordinates of all campuses.

In [3]:
def query_address(address):
    """
    Return response from Open Streetmap database, through the program Nominatim.
    Parameter: address - address of premises
       It works with the complete address of every college in one cell
       If the address is subdivided in parts in different cells, it does not work
       "Scotland" is used, because Nominatim needs to extract the geographical coordinates from the Scotland map
    Returns: result - json, response from open street map in latitude and longitude
    """
    
    url = "https://nominatim.openstreetmap.org/search"
    parameters = {'q':'{}, Scotland'.format(address), 'format':'json'}
    response = requests.get(url, params=parameters)
    
    # sometimes there are errors with the coordinates, so the loop is in case it finds empty information in 
    # Nominatim, and to continue looking for the rest of the coordinates

    if response.status_code != 200:
        print("Error querying {}".format(address))
        result = {}
    else:
        result = response.json()
    return result

In [4]:
# The function is applied to my excel data in the column of the complete address, called "Total Address" 
address_list['json'] = address_list['Total Address'].map(lambda x: query_address(x))

In [5]:
# How many locations has found Nominatim and if it does not find coordinates, to drop the rows from the dataset
df1 = address_list[address_list['json'].map(lambda d: len(d)) > 0].copy()
print(df1.shape[0])

25


Of the 57 campuses in my list, Open Street Map found 25, which is not a good result.</br>

In [6]:
# Extract relevant fields from API response (json format) and add them to the excel file
df1['lat'] = df1['json'].map(lambda x: x[0]['lat'])
df1['lon'] = df1['json'].map(lambda x: x[0]['lon'])
df1['type'] = df1['json'].map(lambda x: x[0]['type'])

df1

Unnamed: 0,Region,College,Campus,Address,Address2,Location,Postcode,Total Address,Telf,email,web / Notes,json,lat,lon,type
0,Aberdeen and Aberdeenshire,North East Scotland College,Aberdeen Altens Campus,Hareness Road,Altens Industrial Estate,Aberdeen,AB12 3LE,"Aberdeen Altens Campus, Hareness Road, Altens ...",01224 612704,,www.nescol.ac.uk,"[{'place_id': 220771362, 'licence': 'Data © Op...",57.1160435,-2.0766143400926445,college
3,Aberdeen and Aberdeenshire,North East Scotland College,Fraserburgh Campus,Henderson Road,,Fraserburgh,AB43 9GA,"Fraserburgh Campus, Henderson Road, Fraserburg...",01346 586129,,,"[{'place_id': 105539525, 'licence': 'Data © Op...",57.68555875,-2.026723233888845,college
5,Aberdeen and Aberdeenshire,North East Scotland College,Scottish Maritime Academy,South Road,,Peterhead,AB42 2UP,"Scottish Maritime Academy, South Road, Peterhe...",,,,"[{'place_id': 220760300, 'licence': 'Data © Op...",57.4959129,-1.7962036409976,college
6,Ayrshire,Ayrshire College,Ayr Campus,Dam Park,,Ayr,KA8 0EU,"Ayr Campus, Dam Park, Ayr KA8 0EU",0300 303 0303,,www.ayrshire.ac.uk,"[{'place_id': 104220181, 'licence': 'Data © Op...",55.458407750000006,-4.611970264361494,university
7,Ayrshire,Ayrshire College,Kilmarnock Campus,Hill Street,,Kilmarnock,KA1 3HY,"Kilmarnock Campus, Hill Street, Kilmarnock KA1...",0300 303 0303,,,"[{'place_id': 167979610, 'licence': 'Data © Op...",55.613721850000005,-4.5005754168018175,college
12,Borders,Borders College,Newtown St Boswells,Borders College,,Newtown St Boswells,TD6 0PL,"Borders College, Newtown St Boswells TD6 0PL",08700 50 51 52,Email: enquiries@borderscollege.ac.uk,no further address details,"[{'place_id': 221705255, 'licence': 'Data © Op...",55.5741998,-2.666389818829349,college
14,Forth Valley,Forth Valley College,Alloa Campus,Devon Road,,Alloa,FK10 1PX,"Alloa Campus, Devon Road, Alloa FK10 1PX",,,www.forthvalley.ac.uk,"[{'place_id': 148555983, 'licence': 'Data © Op...",56.1157971,-3.7861990176212754,college
15,Forth Valley,Forth Valley College,Falkirk Campus,Grangemouth Road,,Falkirk,FK2 9AD,"Falkirk Campus, Grangemouth Road, Falkirk FK2 9AD",,,,"[{'place_id': 219109333, 'licence': 'Data © Op...",56.0057521,-3.7678314021947896,college
16,Forth Valley,Forth Valley College,Raploch Campus,Drip Road,,Stirling,FK8 1RD,"Raploch Campus, Drip Road, Stirling FK8 1RD",,,,"[{'place_id': 143319213, 'licence': 'Data © Op...",56.129097650000006,-3.949888080510205,university
17,Forth Valley,Forth Valley College,Stirling Campus,Drip Road,,Stirling,FK8 1SE,"Stirling Campus, Drip Road, Stirling FK8 1SE",,,,"[{'place_id': 143332967, 'licence': 'Data © Op...",56.13294865,-3.960350435662434,college


The column "Type" is very useful in this instance, because it gives back information about the type of premises we are looking the coordinates for. The Open Street Map project depends on personal contributions, so it is not a surprise that not all campuses are registered and also:</br>

- Ayr Campus is catalogued as a university.</br>
- Raploch Campus in Stirling is catalogued as a university.</br>
- Barony College in Dumfries is catalogued as a bus stop.</br>

The dataset is too small to add 3 marker clusters in the code (one for each type: college, university and bus stop). Besides, there is more manual work to do:</br>

- To investigate the coordinates of the 32 campuses not included in OpenStreetMap. I will use Google Maps. When the Search function pinpoints the site, the url of the webpage changes and the coordinates appear and can be copied.
- The "type" in the dataset will be updated manually to "college"
- To copy and paste the coordinates found in this Notebook to the dataset. 
- Some of the longitude coordinates contain two sets of decimals (for example, "-24.556578,17"). those are not read by Folium. The second decimal part was manually deleted (the ",17" part of the numbers).</br>
- The columns of the addresses were deleted from the excel file, to avoid utf-8 errors when drawing the maps.</br>

After these procedures, I have a new file with the basic data and the coordinates of all campuses, which will be used from now on.</br>

In [7]:
from folium import plugins

# Read the new csv file, after all updates
college_list = pd.read_csv('https://github.com/EsterGM/Women-In-Tech/tree/main/Colleges/Coll-Addresses3.csv', encoding='utf-8')

# longitude is a string and not a float
college_list['LONGITUDE'] = college_list['LONGITUDE'].astype(float)

# Scotland coordinates
scotland_coordinates = (56.4907, -4.2026)

# create empty map zoomed in on Scotland
map = folium.Map(width=975,height=560,location=scotland_coordinates, zoom_start=7)

# add markerclusters, because there are too many points in the Central Belt and some cannot be seen properly. 
# This allows for individual toggling
colleges= MarkerCluster(name='college').add_to(map)


for i, row in college_list.iterrows():
    lat = college_list.at[i, 'LATITUDE']
    lng = college_list.at[i, 'LONGITUDE']

   # when clicking in each marker, the name of the college and the campus appears
   # zoom to obtain specific locations
    popup = str(college_list.at[i, 'College']) + '<br>' + '<br>' + str(college_list.at[i, 'Campus'])
    
    folium.Marker(location = [lat, lng], popup= popup, icon = folium.Icon(color='blue')).add_to(colleges)

# Center the map of Scotland in the locations of the colleges
sw = college_list[['LATITUDE', 'LONGITUDE']].min().values.tolist()
ne = college_list[['LATITUDE', 'LONGITUDE']].max().values.tolist()

map.fit_bounds([sw, ne]) 

# enable toggling of data points
folium.LayerControl().add_to(map) 
map  

### <span style="color:blue"> Gender Distribution map comments</span><br>

The dataset obtained from the INFRACT website lists the names of the colleges. From 2005 to 2015, some colleges and/or campuses have dissappeared or merged other colleges.</br>

- For colleges that no longer exist, to apply the name (and coordinates) of the college that took it in.</br>
- For colleges that were merged into other colleges, the location does not change, so the coordinates used would be the ones for the original college/campus.</br>
- If no campus is specified, one will be chosen for mapping purposes. If the college website specifies a central office address, that campus will be chosen. More discussions about this issue and how the mergers took place can be read in the report.</br>

Apart from this, we have the following:</br>

- Not all computing HNDs are offered in all campuses.</br>
- The titles of the HNDs differ from the SQA awards. The HND titles used in this project are the ones listed by the SQA.</br>

Using the assumptions above, the coordinates of the campuses were added manually to the dataset downloaded from Infract, and this modified file was used for the mapping. This same file without the coordinates was used for the graphs in the previous Jupiter Notebook.</br>

This methodology can be used for all HNDs taught in colleges. The csv files downloaded from Infract can be updated with the process here (add calculations of female percentages, add geographical coordinates of colleges, format longitude, etc).</br>

### <span style="color:blue"> Gender Distribution map in 2019</span><br>

The map below represents gender distribution in colleges in Scotland during 2019.There are 10 sites offering the courses:</br>

- __reddish circles__: number of boys attending software development HND courses (size of the circle in relation to number of girls). </br>
- __blue circles__: number of girls in the same courses. </br>

Clicking the blue icons (the white "i" inside) appears the name of the college and the percentage of girls attending as per the formula: [female real / (female real + male)]*100 </br>

In [10]:
import pandas as pd
from folium import plugins
from folium.plugins import MarkerCluster # for clustering the markers

# Read the new csv file, obtained from manually looking for the remnant coordinates.
# This file also contains the gender distribution from 2005 to 2019.
college_list2 = pd.read_csv('https://github.com/EsterGM/Women-In-Tech/tree/main/Colleges/HND-gender-WebDev-coord1.csv', encoding='utf-8')

# longitude is a string and not a float
college_list2['LONGITUDE'] = college_list2['LONGITUDE'].astype(float)

#extract year 2019
y2019 = college_list2[college_list2['Year'] == 2019]
#y2019 = y2019a.sort_values(by="Female real", ascending=False, inplace=False)

# Scotland coordinates
scotland_coordinates = (56.4907, -4.2026)

# create empty map zoomed in on Scotland inside a frame of 975 x 560
map = folium.Map(width=975,height=560,location=scotland_coordinates, zoom_start=7)

for i, row in y2019.iterrows():
    lat = y2019.at[i, 'LATITUDE']
    lng = y2019.at[i, 'LONGITUDE']

   # when clicking in each marker, the name of the college and the campus appears
   # zoom to obtain specific locations
    popup_text = str(y2019.at[i, 'College chosen']) + '<br>' + '<br>' + 'Girls: ' + str(y2019.at[i, 'PER_FEM']) + '%'
    
    folium.Marker(location = [lat, lng], popup= popup_text, icon = folium.Icon(color='blue')).add_to(map)

    folium.CircleMarker(location=(row["LATITUDE"],
                                  row["LONGITUDE"]),
                        radius= row['Female real']+1,
                        # added 1 to all numbers, because it returns issues with 1 and 0, not representing the "1" and 
                        # overrepresenting the "0". It does not affect the map
                        color="#000080",# dark blue colour
                        popup=popup_text,
                        fill=True
                       ).add_to(map)
    
    folium.CircleMarker(location=(row["LATITUDE"],
                                  row["LONGITUDE"]),
                        radius= row['Male'],
                        color="#CC0033",# type of red colour
                        popup=popup_text,
                        fill=True
                       ).add_to(map)

# Center the map of Scotland in the locations of the colleges
sw = y2019[['LATITUDE', 'LONGITUDE']].min().values.tolist()
ne = y2019[['LATITUDE', 'LONGITUDE']].max().values.tolist()

map.fit_bounds([sw, ne]) 

map 

When comparing the two maps, it is very visible the few number of campuses that, in theory, offer this HND and also the huge disproportion of boys (big circles in red) and girls (small circles in blue) attending. </br>

The values of 1 and 0 are so low that their representation in the map cannot be seen.</br>

### <span style="color:blue"> Gender Distribution map in 2018</span><br>

In [9]:
import pandas as pd
from folium import plugins
from folium.plugins import MarkerCluster # for clustering the markers

# Read the new csv file, obtained from manually looking for the remnant coordinates.
# This file also contains the gender distribution from 2005 to 2019.
college_list3 = pd.read_csv('https://github.com/EsterGM/Women-In-Tech/tree/main/Colleges/HND-gender-WebDev-coord1.csv', encoding='utf-8')

# longitude is a string and not a float
college_list3['LONGITUDE'] = college_list3['LONGITUDE'].astype(float)

#extract year 2018
y2018 = college_list3[college_list3['Year'] == 2018]

# Scotland coordinates
scotland_coordinates = (56.4907, -4.2026)

# create empty map zoomed in on Scotland inside a frame of 975 x 560
map1 = folium.Map(width=975,height=560,location=scotland_coordinates, zoom_start=7)

for i, row in y2018.iterrows():
    lat = y2018.at[i, 'LATITUDE']
    lng = y2018.at[i, 'LONGITUDE']

   # when clicking in each marker, the name of the college and the campus appears
   # zoom to obtain specific locations
  
    popup_text2 = str(y2018.at[i, 'College chosen']) + '<br>' + '<br>' + 'Girls: ' + str(y2018.at[i, 'PER_FEM']) + '%'
    
    folium.Marker(location = [lat, lng], popup= popup_text2, icon = folium.Icon(color='blue')).add_to(map1)

    folium.CircleMarker(location=(row["LATITUDE"],
                                  row["LONGITUDE"]),
                        radius= row['Female real'] + 1, 
                        # added 1 to all numbers, because it returns issues with 1 and 0, not representing the "1" and 
                        # overrepresenting the "0". It does not affect the map
                        color="#000080", # dark blue colour
                        popup=popup_text2,
                        fill=True).add_to(map1)
    
    folium.CircleMarker(location=(row["LATITUDE"],
                                  row["LONGITUDE"]),
                        radius= row['Male'],
                        color="#CC0033", # type of red colour
                        popup=popup_text2,
                        fill=True).add_to(map1)

# Center the map of Scotland in the locations of the colleges
sw = y2018[['LATITUDE', 'LONGITUDE']].min().values.tolist()
ne = y2018[['LATITUDE', 'LONGITUDE']].max().values.tolist()

map1.fit_bounds([sw, ne]) 

map1 

### <span style="color:blue"> Conclusions</span><br>

As it has been seen in other sections of this study, girls are not very well represented in computer courses, in this case in Web Development. 2019 and 2018 are just a couple of visual examples of this difference in gender distribution. </br>

In 2019 and 2018, there were 10 centres that offered Software Development and 6 Web Development. The offer seems to be mostly localized in Glasgow, Edinburgh and Aberdeen colleges, y some in Dundee and Motherwell. Stirling does not offer Web Development but has courses in Software Development.</br>

As an additional information, maps can be prepared for the years 2017 to 2005, however, the visuals will be similar to those in this Notebook. The graphs in the previous notebook (by year, college, region) show gaps in gender distribution across all sites and years, which will show in the maps as well.</br>