Throughout I will be using an API, unidbapi, to assess demographics of universities in the UK to analyse the gender split across students. The API only returns female or male, so I can only analyse that here.

### unidbapi endpoints:
Initially, we will be focussing on the following two endpoints:
- University demographics
    - https://unidbapi.com/api/university/read_demographics?u={university name}&key={API key}
    - https://unidbapi.com/api/university/read_demographics?u={university name}&key=56d0b15602
- Degree demographics
    - https://unidbapi.com/api/degree/read_demographics?d={degree}&key={API key}
    - https://unidbapi.com/api/degree/read_demographics?d={degree}&key=56d0b15602

The documentation doesn't have a list of all of the universities, or degrees, within the API. So, we will create a list of universities and degrees, and test these in the API to see if they bring back anything but invalid request.

In [52]:
f = open("uniDB_api.txt", "r")

In [53]:
api_key = f.readline()

### degrees

In [45]:
import requests
import json
from pprint import pprint as pp
import numpy as np

I have manually created a list of degrees from: https://www.britishuni.com/subject-guide/subject-list

In [26]:
degrees = ['Archaeology', 'Architecture', 'Art', 'Business', 'Chemistry', 'Engineering', 'History', 'Computer Science',
           'Creative Writing', 'Criminology', 'Dentistry', 'Drama', 'Dance', 'Economics', 'Education', 'Electrical Engineering',
           'Biology', 'Mathematics', 'English', 'French'] # to be completed, https://www.britishuni.com/subject-guide/subject-list

valid_degrees = []

Create functions to assist with extraction:

In [27]:
def degree_endpoint(degree):
    endpoint = "https://unidbapi.com/api/degree/read_demographics?d={}&key={}".format(degree, api_key)
    return endpoint

In [38]:
for degree in degrees:
    response = requests.get(degree_endpoint(degree))
    if json.dumps(response.json()) != "{\"error\": \"Invalid Query\"}":
        valid_degrees.append(degree)

Create the list of valid degrees

In [29]:
for degree in valid_degrees:
    endpoint = degree_endpoint(degree)

    response = requests.get(endpoint)
    pp(response.json())

[{'degree': 'Architecture',
  'female_grads': '6700',
  'male_grads': '11010',
  'year': '2014'},
 {'degree': 'Architecture',
  'female_grads': '6975',
  'male_grads': '10790',
  'year': '2015'},
 {'degree': 'Architecture',
  'female_grads': '7270',
  'male_grads': '10905',
  'year': '2016'},
 {'degree': 'Architecture',
  'female_grads': '7810',
  'male_grads': '11380',
  'year': '2017'},
 {'degree': 'Architecture',
  'female_grads': '8170',
  'male_grads': '11255',
  'year': '2018'}]
[{'degree': 'History',
  'female_grads': '13450',
  'male_grads': '11925',
  'year': '2014'},
 {'degree': 'History',
  'female_grads': '13855',
  'male_grads': '12145',
  'year': '2015'},
 {'degree': 'History',
  'female_grads': '14260',
  'male_grads': '12420',
  'year': '2016'},
 {'degree': 'History',
  'female_grads': '14525',
  'male_grads': '12275',
  'year': '2017'},
 {'degree': 'History',
  'female_grads': '14660',
  'male_grads': '12175',
  'year': '2018'}]
[{'degree': 'Computer Science',
  'femal

### Universities

I have manually created a list of UK universities by extracting the names from an online list: https://www.university-list.net/uk/universities-1000.html

In [30]:
unis = ['The University of Aberdeen', 'University of Abertay Dundee', 'Aberystwyth: United Theological College and College of Welsh Independents', 'The University of Wales', 'Anglia Polytechnic University', 'Askham Bryan College', 'Aston University', 'Aylesbury College', 'University of Wales, Bangor', 'Barking and Dagenham College', 'Barnsley College', 'Basingstoke College of Technology', 'University of Bath', 'Bath Spa University College', 'The University of Birmingham', 'Birmingham College of Food, Tourism and Creative Studies', 'Bishop Burton College', 'Bishop Grosseteste College', 'Blackburn College' ]#, need to fill more out using https://www.university-list.net/uk/universities-1000.html
valid_unis = []
invalid_unis = []

see which of the above are valid for the API

In [39]:
def uni_endpoint(uni):
    endpoint = "https://unidbapi.com/api/university/read_demographics?u={}&key=cfd3d28159".format(uni)
    return endpoint

In [40]:
for uni in unis:
    endpoint = uni_endpoint(uni)
    response = requests.get(endpoint)
    if json.dumps(response.json()) != "{\"error\": \"Invalid Query\"}":
        valid_unis.append(uni)
    else:
        invalid_unis.append(uni)

sometimes universities are referenced with 'The' infront of the name. So, to ensure we check as many as possible, I will also check with a 'the' infront:

In [32]:
for unis in invalid_unis:
    try_the = 'The ' + uni
    endpoint = "https://unidbapi.com/api/university/read_demographics?u={}&key={}".format(try_the, api_key)
    response = requests.get(endpoint)
    if json.dumps(response.json()) != "{\"error\": \"Invalid Query\"}":
        valid_unis.append(uni)

Print the valid universities for assurance:

In [33]:
for unis in valid_unis:
    print(unis)

The University of Aberdeen
University of Abertay Dundee
Aberystwyth: United Theological College and College of Welsh Independents
The University of Wales
Anglia Polytechnic University
Askham Bryan College
Aston University
Aylesbury College
University of Wales, Bangor
Barking and Dagenham College
Barnsley College
Basingstoke College of Technology
University of Bath
Bath Spa University College
The University of Birmingham
Birmingham College of Food, Tourism and Creative Studies
Bishop Burton College
Bishop Grosseteste College
Blackburn College


Now that we have a list of degrees and universities that are in the API's 'system', we can begin reading the API for data.

The degrees end point returns the following, multiple times (e.g. 4x, presumably for each university record that matches this degree? :
- "degree": {degree as a string},
- "male_grads": "number as a string",
- "female_grads": "number as a string",
- "year": {as a string}

And the university endpoint returns the following once:
- "name": "Anglia Ruskin University",
- "male_grads": "36",
- "female_grads": "64",
- "male_staff": "60",
- "female_staff": "40",
- "year": "2020"

# Practise atm - needs looping & editing
Lets try getting parts of the response:

Degree

In [None]:
def ratio_female_to_male(males, females):
    ratio = females / males
    return ratio

In [51]:
degree_average_ratio = {}

In [None]:
for degree in valid_degrees:
    endpoint = degree_endpoint(degree)
    lists = requests.get(endpoint).json()

    amount_of_data = len(lists)

    ratios_for_this_degree = []

    i = 0
    while i < amount_of_data:
        data = lists[i]
        ratio = ratio_female_to_male(data['male_grads'], data['female_grads'])
        ratios_for_this_degree.append(ratio)
        i += 1

    arr = np.array(ratios_for_this_degree)
    degree_average = np.average(arr)
    degree_average_ratio.update({degree : degree_average})

Now we have a dictionary of female to male ratios per course, lets plot the degrees against their ratio to perform some analysis

### University

In [36]:
endpoint = "https://unidbapi.com/api/university/read_demographics?u=Anglia%20Ruskin%20University&key={}".format(api_key)
# response = requests.get(endpoint)
data = requests.get(endpoint).json()
print(data)

{'name': 'Anglia Ruskin University', 'male_grads': '36', 'female_grads': '64', 'male_staff': '60', 'female_staff': '40', 'year': '2020'}


In [37]:
data['name']

'Anglia Ruskin University'

all university demographic responses have 6 responses, so I will extract the third anf fifth elements (female_grads = index 2, and female_staff = index 4).
Although we are assessing education, as there is the data available for percentage of staff that are female, I will perform the same analysis to see if there is any correlation.

In [42]:
female_grad_percentage_per_uni = []
female_staff_percentage_per_uni = []

In [None]:
for university in valid_unis:
    endpoint = uni_endpoint(university)
    data = requests.get(endpoint).json()
