# Accredited Courses

In many professions, a particular accredited degree (or equivalent) is required for entry into certain professional  grades.

There are three datasets that seem to reference accreditation, so let's have a quick peek at them.

First, the `ACCREDITATION` table:

In [15]:
!head -n 2 on_2021_08_11_07_24_51/ACCREDITATION.csv

PUBUKPRN,UKPRN,KISCOURSEID,KISMODE,ACCTYPE,ACCDEPEND,ACCDEPENDURL,ACCDEPENDURLW
10000163,10000163,MCCF-B320,1,05701,0,,


That looks like it's the raw data. It gives us an institution, a course ID, and an accredidation type.

So does the `ACCREDITATIONTABLE` help us decode the accreditaion type?

In [16]:
!head -n 2 on_2021_08_11_07_24_51/ACCREDITATIONTABLE.csv  

ACCURL,ACCTEXT,ACCTEXTW,ACCTYPE
http://www.arb.org.uk/student,Prescribed by the Architects Registration Board (ARB) for the purpose of registration in the UK.,Achrededig gan Fwrdd Cofrestru Penseiri (ARB) at bwrpas cymhwysedd ar gyfer cofrestru gyda'r corff hwnnw.,00101


The `AccreditationByHep` table looks like it's a target for displaying human readable text to explian accreditation on a particular course to a website end user.

In [17]:
!head -n 2 on_2021_08_11_07_24_51/AccreditationByHep.csv

AccreditingBodyName,AccreditionType,HEP,KisCourseTitle,KiscourseID
A Greener Festival,A Greener Festival is recognised around the world by the event and festival industry and academia for its work in helping events and festivals to reduce their environmental impact. Accreditation by A Greener Festival means that this course provides students with the level and depth of theory and knowledge of current practice required by environmental practitioners in the event industry both in the UK and internationally.,(10008640) Falmouth University,BA (Hons) Creative Events Management,BACEMAFC


Let's see how we can join the base tables together, along with some course information and course population data.

To being with, load in *pandas* in the conventional way.

In [18]:
import pandas as pd

Read in the two basic accreditation tables:

In [19]:
accreditation_df = pd.read_csv("on_2021_08_11_07_24_51/ACCREDITATION.csv")

accreditation_table_df = pd.read_csv("on_2021_08_11_07_24_51/ACCREDITATIONTABLE.csv")

We can join these two tables together to give us a more explanatory dataset:

In [20]:
accreditation_df = pd.merge(accreditation_df,
                            accreditation_table_df[["ACCTEXT", "ACCTYPE"]],
                            on="ACCTYPE")

accreditation_df.head(3)

Unnamed: 0,PUBUKPRN,UKPRN,KISCOURSEID,KISMODE,ACCTYPE,ACCDEPEND,ACCDEPENDURL,ACCDEPENDURLW,ACCTEXT
0,10000163,10000163,MCCF-B320,1,5701,0,,,Recognised by the General Chiropractic Council...
1,10004078,10004078,5008A,1,5701,0,,,Recognised by the General Chiropractic Council...
2,10007161,10007161,CHIIMS_FT,1,5701,0,,,Recognised by the General Chiropractic Council...


*Ideally, it would be nice to have an accrediting body code and then a table identifying the accrediting bodies, but that might be something we could tease out of the data in a later conversation.*

We can also add in course and provider names.

Let's start with the provider names:

In [21]:
ukprns = pd.read_excel("UNISTATS_UKPRN_lookup_20160901.xlsx", "Lookup")


accreditation_df = pd.merge(accreditation_df, ukprns, on="UKPRN")

accreditation_df.head(3)

Unnamed: 0,PUBUKPRN,UKPRN,KISCOURSEID,KISMODE,ACCTYPE,ACCDEPEND,ACCDEPENDURL,ACCDEPENDURLW,ACCTEXT,NAME
0,10000163,10000163,MCCF-B320,1,5701,0,,,Recognised by the General Chiropractic Council...,AECC Chiropractic College
1,10004078,10004078,5008A,1,5701,0,,,Recognised by the General Chiropractic Council...,London South Bank University
2,10004078,10004078,4667A,1,601,0,,,Accredited by the Association of Chartered Cer...,London South Bank University


For the course name, we can merge on the `KISCOURSEID` *and* `UKPRN` taken together (this would disambiguate any courses with the same `KISCOURSEID` (if there are any) offered by different providers).

In [22]:
course_df = pd.read_csv("on_2021_08_11_07_24_51/KISCOURSE.csv")

# Just get some key information from the course dataset
course_names = course_df[["UKPRN", "KISCOURSEID", "TITLE"]].drop_duplicates()

course_names.head(3)

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


Unnamed: 0,UKPRN,KISCOURSEID,TITLE
0,10001143,PSSFDOPTDIS,Ophthalmic Dispensing
1,10000055,AB20,Animal Behaviour and Welfare
2,10000055,AB29,Mechanical Engineering


The merge on this compound key is achieved by passing the compound key elements in a list:

In [23]:
accreditation_df = pd.merge(accreditation_df, course_names, on=["UKPRN", "KISCOURSEID"])

accreditation_df.head(3)

Unnamed: 0,PUBUKPRN,UKPRN,KISCOURSEID,KISMODE,ACCTYPE,ACCDEPEND,ACCDEPENDURL,ACCDEPENDURLW,ACCTEXT,NAME,TITLE
0,10000163,10000163,MCCF-B320,1,5701,0,,,Recognised by the General Chiropractic Council...,AECC Chiropractic College,Chiropractic
1,10004078,10004078,5008A,1,5701,0,,,Recognised by the General Chiropractic Council...,London South Bank University,Chiropractic
2,10004078,10004078,4667A,1,601,0,,,Accredited by the Association of Chartered Cer...,London South Bank University,Economics with Accounting


## Finding Accrediting Bodies

The `AccreditationByHep` file identifies recognising bodies by course:

In [24]:
accrediting_df = pd.read_csv("on_2021_08_11_07_24_51/AccreditationByHep.csv")

accrediting_df.head()

Unnamed: 0,AccreditingBodyName,AccreditionType,HEP,KisCourseTitle,KiscourseID
0,A Greener Festival,A Greener Festival is recognised around the wo...,(10008640) Falmouth University,BA (Hons) Creative Events Management,BACEMAFC
1,A Greener Festival,A Greener Festival is recognised around the wo...,(10008640) Falmouth University,BA (Hons) Sustainable Festival Management,BASUSFFFU01
2,A Greener Festival,A Greener Festival is recognised around the wo...,(10008640) Falmouth University,BA (Hons) Sustainable Tourism Management,BASUSTFFU01
3,Academy of Pharmaceutical Sciences,Accredited by the Academy of Pharmaceutical Sc...,(10007146) The University of Greenwich,BSc (Hons) Pharmaceutical Sciences,K0263
4,Academy of Pharmaceutical Sciences,Accredited by the Academy of Pharmaceutical Sc...,(10007146) The University of Greenwich,BSc (Hons) Pharmaceutical Sciences,K0263


So we can also merge that data in:

In [25]:
accreditation_df = pd.merge(accreditation_df,
                            accrediting_df[["KiscourseID", "KisCourseTitle", "AccreditingBodyName"]],
                            left_on=["KISCOURSEID"], right_on=['KiscourseID'])

accreditation_df.head(3)

Unnamed: 0,PUBUKPRN,UKPRN,KISCOURSEID,KISMODE,ACCTYPE,ACCDEPEND,ACCDEPENDURL,ACCDEPENDURLW,ACCTEXT,NAME,TITLE,KiscourseID,KisCourseTitle,AccreditingBodyName
0,10000163,10000163,MCCF-B320,1,5701,0,,,Recognised by the General Chiropractic Council...,AECC Chiropractic College,Chiropractic,MCCF-B320,MChiro (Hons) Chiropractic,General Chiropractic Council (GCC)
1,10004078,10004078,5008A,1,5701,0,,,Recognised by the General Chiropractic Council...,London South Bank University,Chiropractic,5008A,MChiro (Hons) Chiropractic,General Chiropractic Council (GCC)
2,10004078,10004078,4667A,1,601,0,,,Accredited by the Association of Chartered Cer...,London South Bank University,Economics with Accounting,4667A,BSc (Hons) Economics with Accounting,Association of Chartered Certified Accountants...


Using this information, we could then start to produce lists of courses accreditied by a particular body, as well as identify subject codes for courses accredited by a particular body. In turn, this would let us identify courses tagged with those subject codes that are *not* accredited. And so the data conversations can go... 

For example, what courses do the Institution of Civil Engineers accredit (perhaps?!)?

In [27]:
civil_eng_filter = accreditation_df["AccreditingBodyName"]=="Institution of Civil Engineers (ICE)"

accreditation_df[civil_eng_filter].head(5)

Unnamed: 0,PUBUKPRN,UKPRN,KISCOURSEID,KISMODE,ACCTYPE,ACCDEPEND,ACCDEPENDURL,ACCDEPENDURLW,ACCTEXT,NAME,TITLE,KiscourseID,KisCourseTitle,AccreditingBodyName
231,10004078,10004078,191A,1,3002,0,,,Accredited by the Chartered Institution of Hig...,London South Bank University,Civil Engineering,191A,BEng (Hons) Civil Engineering,Institution of Civil Engineers (ICE)
232,10004078,10004078,191A,1,3002,0,,,Accredited by the Chartered Institution of Hig...,London South Bank University,Civil Engineering,191A,BEng (Hons) Civil Engineering,Institution of Civil Engineers (ICE)
239,10004078,10004078,191A,2,3002,0,,,Accredited by the Chartered Institution of Hig...,London South Bank University,Civil Engineering,191A,BEng (Hons) Civil Engineering,Institution of Civil Engineers (ICE)
240,10004078,10004078,191A,2,3002,0,,,Accredited by the Chartered Institution of Hig...,London South Bank University,Civil Engineering,191A,BEng (Hons) Civil Engineering,Institution of Civil Engineers (ICE)
247,10004078,10004078,191A,1,10002,0,,,Accredited by the Institution of Structural En...,London South Bank University,Civil Engineering,191A,BEng (Hons) Civil Engineering,Institution of Civil Engineers (ICE)


### Named Entity Recognition

Just as a hack aside, we can also try to identify accrediting bodies from the `ACCTEXT` field using a named entity recognition (NER) parser. For example, the `spacy` natural language processing Python package provides a simple way of identifying named entities.

In [11]:
%%capture
try:
    import spacy
    nlp = spacy.load('en_core_web_sm')
except:
    %pip install spacy
    %pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.0.0/en_core_web_sm-3.0.0-py3-none-any.whl

# YOU MAY NEED TO RESTART THE NOTEBOOK KERNEL IF THE MODEL WAS DOWNLOADED

Load in the `spacy` package and a simple English language model:

In [12]:
import spacy

nlp = spacy.load('en_core_web_sm')

We can extract named entitities from a string using the following approach:

In [12]:
txt = 'Recognised by the General Chiropractic Council (GCC) for the purpose of eligibility for registration with that body.'

# Parse the text
doc = nlp(txt)

# Access any recognised entities
doc.ents

(the General Chiropractic Council, GCC)

We can create a simple function to return the first of any found named entities, and also tidy them up a little:

In [13]:
def get_entitities(txt):
    """Find named entitites."""
    doc = nlp(txt)
  
    entities  = [ent.text for ent in doc.ents]
    if entities:
        entity = entities[0]
        
        # Cleaning rules
        if entity.startswith('the '):
            entity = entity[len('the '):]
        return entity.strip()
    return ''

We can then apply this to the `ACCTEXT` data and generate a new column of values that guess at the name of the accrediting body:

In [14]:
pd.Series(accreditation_df["ACCTEXT"].unique()).head().apply(get_entitities)

0                      General Chiropractic Council
1    Association of Chartered Certified Accountants
2     Chartered Institute of Management Accountants
3       Institute of Leadership and Management (ILM
4                Institute of Chartered Accountants
dtype: object

If we create a lookup of unique `ACCTEXT` items, we can run the named entity recognistion routine over them then merge them back into the original dataset.

So let's get the unique texts:

In [15]:
unique_bodies = pd.DataFrame(accreditation_df["ACCTEXT"].unique(), columns=["ACCTEXT"])
unique_bodies.head()

Unnamed: 0,ACCTEXT
0,Recognised by the General Chiropractic Council...
1,Accredited by the Association of Chartered Cer...
2,Accredited by the Chartered Institute of Manag...
3,Accredited by the Institute of Leadership and ...
4,Accredited by the Institute of Chartered Accou...


And run the NER routine over them:

In [16]:
unique_bodies["ENTITY"] = unique_bodies["ACCTEXT"].apply(get_entitities)
unique_bodies.head(3)

Unnamed: 0,ACCTEXT,ENTITY
0,Recognised by the General Chiropractic Council...,General Chiropractic Council
1,Accredited by the Association of Chartered Cer...,Association of Chartered Certified Accountants
2,Accredited by the Chartered Institute of Manag...,Chartered Institute of Management Accountants


So what sorts of thing have we got?

In [17]:
unique_bodies["ENTITY"].to_list()

['General Chiropractic Council',
 'Association of Chartered Certified Accountants',
 'Chartered Institute of Management Accountants',
 'Institute of Leadership and Management (ILM',
 'Institute of Chartered Accountants',
 'Chartered Institute of Marketing',
 '',
 'Chartered Institute of Architectural Technologists',
 'Chartered Institute of Building (CIOB',
 'Royal Institution of Chartered Surveyors (RICS',
 '',
 '',
 'Health and Care Professions Council',
 'The Chartered Society of Forensic Sciences',
 'Architects Registration Board',
 'Royal Institute of British Architects',
 'Graduate Basis for Chartered Membership (GBC',
 'Health and Care Professions Council',
 'College of Operating Department Practitioners',
 'Chartered Institute of Personnel and Development',
 'BCS',
 'ScreenSkills',
 'Broadcasting Journalism Training Council',
 'Institution of Engineering Designers',
 'Institution of Engineering Designers',
 'Health and Care Professions Council',
 'Royal College of Occupational 

Okay, so there are some possible misses there... But this is a bar room conversation, right, and we're just having a natter to get the gist of what's going on. The outlier detail is where the pros get to say why conversations like this are misleading and why you need to do things properly... But for the rest of, mostly not wrong is often good enough, as long as we remember our source might be a little bit unreliable...

We can also merge the entities back into the original dataset:

In [18]:
accreditation_df = pd.merge(accreditation_df, unique_bodies, on="ACCTEXT")

accreditation_df.head(3)

Unnamed: 0,PUBUKPRN,UKPRN,KISCOURSEID,KISMODE,ACCTYPE,ACCDEPEND,ACCDEPENDURL,ACCDEPENDURLW,ACCTEXT,NAME,TITLE,ENTITY
0,10000163,10000163,MCCF-B320,1,5701,0,,,Recognised by the General Chiropractic Council...,AECC Chiropractic College,Chiropractic,General Chiropractic Council
1,10004078,10004078,5008A,1,5701,0,,,Recognised by the General Chiropractic Council...,London South Bank University,Chiropractic,General Chiropractic Council
2,10007161,10007161,CHIIMS_FT,1,5701,0,,,Recognised by the General Chiropractic Council...,Teesside University,Chiropractic,General Chiropractic Council
