# A Data Scrape of Alberta Education Learning Outcomes

The Alberta Ministry of Education hosts teaching resources relevant to K-12 programs of study at [learnalberta.ca](http://www.learnalberta.ca/ProgramsOfStudy.aspx). Many of these resources outline common curriculum outcomes. We can extract and visualize structures of the more codified ones.

---

### 1. Get the Data


We will scrape what we can from queries of [learnalberta.ca](http://www.learnalberta.ca/ProgramsOfStudy.aspx) using `requests`.

In [62]:
url = 'http://www.learnalberta.ca/ProgramsOfStudy.aspx'

# point to and query url
from requests import get, utils
r = get(url); r.ok

True

$\uparrow \ $ If we see `True` above, then our first request worked. With this response, we can proceed.

The `BeautifulSoup` package offers us a way to parse the web content we receive.

We first detect the encoding of the response.

In [63]:
# decide which encoding to use
from bs4.dammit import EncodingDetector
httpEncoding = r.encoding if 'charset' in r.headers.get('content-type', '').lower() else None
htmlEncoding = EncodingDetector.find_declared_encoding(r.content, is_html=True)
encoding = htmlEncoding or httpEncoding; encoding

'utf-8'

Now we parse. We extract the names of programs of study to see that things are working.

In [64]:
# begin parsing HTML
from bs4 import BeautifulSoup as bs
soup = bs(r.content, "lxml", from_encoding=encoding)
searchFields = [i.find_next() for i in soup.find_all('strong')]

searchOptions = [i.find_next()
                  .find_next()
                  .find_next()
                  .find_all("option")[1:]
                     for i in soup.find_all('strong')
                ]
fieldsOptionsReference = dict(zip(searchFields, searchOptions))

# print programs
for field, option in fieldsOptionsReference.items():
    print(field.text)
    for opt in option:
        print('\t', opt.text)

Core Programs
	 English Language Arts
	 Fine Arts
	 Health / Career & Life Management
	 Mathematics
	 Physical Education
	 Science
	 Social Studies
Complementary Programs
	 Aboriginal Studies
	 CTF
	 CTS: Apprenticeship
	 CTS: Business, Administration, Finance & Information Technology (BIT)
	 CTS: Career Transitions (CTR)
	 CTS: Health, Recreation & Human Services (HRH)
	 CTS: Media, Design & Communication Arts (MDC)
	 CTS: Natural Resources (NAT)
	 CTS: Trades, Manufacturing & Transportation (TMT)
	 English as a Second Language
	 Fine Arts
	 First Nations, Métis and Inuit (FNMI) Languages
	 French as a Second Language (FSL)
	 International Languages
	 Knowledge and Employability Occupational Courses


---

### 2. Select a Program & Course

Once we choose a subject, we can preview specific programs.

In [65]:
subject = 'Mathematics'

# find programs
from learnAlberta import soupForField
programs = soupForField(subject).find_all('a', href=lambda href: href and "ProgramOfStudy" in href)
programReference = {}
for p in programs:
    href = p['href']
    parameters = dict(q.split('=') for q in utils.urlparse(href).query.split('&'))
    programReference[parameters['ProgramId']] = p.text.strip()
programReference

{'174398': 'Mathematics (K & E) 10-4, 20-4',
 '26061': 'Mathematics Kindergarten to Grade 9 (2007, Updated 2016)',
 '348234': 'Mathematics Grade 10 - 12',
 '432948': 'Mathematics 31',
 '447048': 'Mathematics (K & E) Grade 8 - 9'}

$\uparrow \ $ We choose a `programID` from those above, and specify a course code, e.g. 20-1.

In [99]:
programID = "348234"
course = '20-1'

---

### 3. Isolate Learning Outcomes

Each high school mathematics course has its general outcomes formatted as headers.

Since `BeautifulSoup` can detect headers, there is a structure for us to isolate.

In [100]:
# find headers for course code
from learnAlberta import soupForProgram
programData = soupForProgram(programID)
rationaleSections = programData.find_all('div', class_='title')
for i, r in enumerate(rationaleSections):
    if course in r.text:
        print(i, ':', r.text)

29 : 20-1 
30 : Algebra & Number (20-1)
37 : Trigonometry (20-1)
41 : Relations & Functions (20-1)


$\uparrow \ $ We should see the names of general outcomes in Mathematics 20-1.

We can go further and isolate specific learning outcomes, which we will represent as `vertices` of a [graph](https://en.wikipedia.org/wiki/Graph).

For brevity, we use Python methods from an accompanying file, `learnAlberta.py`.

In [101]:
# generate vertex and processes data
from learnAlberta import removeBrackets, groupByOutcomeGroup
outcomeGroups = [(i, ''.join(c for c in removeBrackets(r.text) if c.isupper()))
                 for (i, r) in enumerate(rationaleSections) if course in r.text and '(' in r.text]
vertices, processes = groupByOutcomeGroup(subject, course, rationaleSections, outcomeGroups)

Our set of `vertices`, a list of dictionaries, has a useful format, and may be exported as a `DataFrame` via `pandas`.

In [102]:
from pandas import DataFrame
df = DataFrame(vertices)

In [103]:
df[:3] # generate first three elements

Unnamed: 0,description,group,name,processes
0,Demonstrate an understanding of the absolute v...,0,MA20-1.AN1,"[Reasoning, Visualization]"
1,Solve problems that involve operations on radi...,0,MA20-1.AN2,"[Connections, Mental Mathematics and Estimatio..."
2,Solve problems that involve radical equations ...,0,MA20-1.AN3,"[Communication, Problem Solving, Reasoning]"


Another step yields `CSV` data.

In [104]:
print(df[:3].to_csv())

,description,group,name,processes
0,Demonstrate an understanding of the absolute value of real numbers,0,MA20-1.AN1,"['Reasoning', 'Visualization']"
1,Solve problems that involve operations on radicals and radical expressions with numerical and variable radicands,0,MA20-1.AN2,"['Connections', 'Mental Mathematics and Estimation', 'Problem Solving', 'Reasoning']"
2,Solve problems that involve radical equations (limited to square roots),0,MA20-1.AN3,"['Communication', 'Problem Solving', 'Reasoning']"



---

### 4. Connect the Data

We have yet to connect these learning outcomes.

Each specific learning outcome in mathematics is tied to several of seven interrelated "mathematical processes."

These processes aim to "permeate the teaching and learning of mathematics." [$^\dagger$](http://www.learnalberta.ca/ProgramOfStudy.aspx?lang=en&ProgramId=348234#)

In [105]:
# preview processes of first learning outcome
vertices[0]['processes']

['Reasoning', 'Visualization']

Earlier we generated this set, `processes`, to connect mathematical processes to specific learning outcomes.

In [106]:
processes['Mental Mathematics and Estimation']

['MA20-1.AN2',
 'MA20-1.AN4',
 'MA20-1.AN5',
 'MA20-1.T2',
 'MA20-1.RF1A',
 'MA20-1.RF1B',
 'MA20-1.RF1C',
 'MA20-1.RF1D']

The `edges` of our graph are chosen to connect learning outcomes sharing one or more of the seven processes.

In [107]:
edges = []

# find all pairwise combinations of identifiers within a process sublist, append to edges
from itertools import permutations
for process, identifiers in processes.items():
    pairwise = list(permutations(identifiers, 2))
    for p in pairwise:
        edges.append({
                    'source': next((i for (i, d) in enumerate(vertices) if d['name'] == p[0]), None),
                    'target': next((i for (i, d) in enumerate(vertices) if d['name'] == p[1]), None),
                    'value': 1
                })

We may also group specific learning outcomes by their general outcomes.

In fact, this is already done! Notice the `group` column in our `DataFrame`.

---

### 5. Visualize Processes

We export our newfound sets to the `JSON` format.

In [108]:
from json import dumps
with open('learnAlberta.json', 'w') as f:
    f.write(dumps({
        'nodes': vertices,
        'links': edges},
        indent=4))

With [$\mathbf{\text{D3}}.js$](https://d3js.org/) we lay out the graph as a matrix.

We are using JavaScript from another accompanying file, `processesMatrix.html`.

In [109]:
from IPython.display import HTML
with open('processesMatrix.html', 'r') as g:
    display(HTML(g.read()))

$\uparrow \ $ Ordering the matrix "`by Outcome`" groups specific learning outcomes with general outcomes, as desired.

We may reorder the matrix "`by Frequency`" using the dropdown menu to see which learning outcomes have more overlap between processes.

If any two specific outcomes emerge from the same general outcome, they are endowed with a colour.

The greyscale colouring, then, is assigned to pairs of specific outcomes that emerge from different general outcomes, but share similar processes.

The opacity of any cell quantifies its relative overlap with the processes of another cell.

The colourless region is significant, as it reflects the interplay between general outcomes.

For example, observe above that outcomes `AN5: Perform operations on rational expressions` and `RF7: Solve problems that involve linear and quadratic inequalities in two variables` have no overlap with respect to their processes.

We may compare the processes of `AN5`, `"Communication | Problem Solving | Technology | Visualization,"` with those of `RF7`, `"Connections | Mental Mathematics and Estimation | Reasoning,"` to see that, indeed, there is **no** pairwise overlap.

---

_Assembled by Eric Easthope_

Visualization directly inspired by [_Les Misérables Co-occurrence_](https://bost.ocks.org/mike/miserables/).

_MIT License_

---