<img src="https://crowdmark-com.s3.amazonaws.com/website/images/visual-identity/crowdmark-logo-dark.png" alt="Crowdmark logo" width ="180" align="right">

# Page Images Extracted for Local Storage

[Crowdmark API Guide (draft version)](https://gist.github.com/heycarsten/46060b3cfce1eaeed325ddd3cdb79f0b)

[Draft Documentation Site](https://crowdmark-api-docs.surge.sh/readme/)

**Goals:** Use the Crowdmark API to

1. Extract front page images for all students from a given assessment on Crowdmark.
2. Associate front page images with the corresponding student metadata.
3. Extract all page images for all booklets as clickable thumbnails


In [None]:
## We will bring the data from Crowdmark into this notebook by importing JSON.
# import pandas as pd ## We don't currently use pandas.
# from pandas.io.json import json_normalize
import json
import os

## Enter your Crowdmark API key

In [None]:
## Execute this cell to generate a request for your API key.
## Paste your api key into the text field (and press enter); Comment out the next line to define your API Key.
# api = input("What is your Crowdmark API key:")

In [None]:
## This cell writes the content of `api` to the file .crowdmark-key.

#with open(".crowdmark-key", "w") as text_file:
#   print(f"{api}", file=text_file)

In [None]:
# This cell reads in the .crowdmark-key file and saves it as apiKey.
# The API key allows the computer hosting your Jupyter notebook to programmatically access data from Crowdmark.
with open(".crowdmark-key", 'r') as f:
    apiKey = f.read().rstrip()
# apiKey

## Connect to Scores Data on Crowdmark

## Authentication with Crowdmark API Key

In [None]:
## Carsten Nielsen helped me with pagination.
def api_get(path):
    req = Request('https://app.crowdmark.com/api' + path)
    req.add_header('Authorization', 'Token token="' + apiKey + '"')
    res = urlopen(req).read().decode('utf8')
    return json.loads(res)

## Specifying the Assessment for Analysis

![assessment_slug](https://wwejubwfy.s3.amazonaws.com/WUSTL_Demonstration_Assessment__Crowdmark_2017-01-07_15-58-14.png)

In [None]:
# Define the assessment to investigate.
# Select the assessment slug from inside the URL from the assessment's dashboard on Crowdamrk.
# https://app.crowdmark.com/exams/<assessment-slug>/dashboard
assessment_slug = 'math-sample-assessment-fe8c2'
# assessment_slug = 'comc2016-mock2017-en'

In [None]:
# GET /api/booklets/{booklet-id}
def getbooklet(e8):
    booklet_resp = urlopen(cm + '/api/booklets/' + e8 + '?api_key=' + apiKey ).read().decode('utf8')
    return json.loads(booklet_resp)


# GET /api/enrollments/{enrollment-id}
def getenrollment(e2):
    enrollment_resp = urlopen(cm + '/api/enrollments/' + e2 + '?api_key=' + apiKey ).read().decode('utf8')
    return json.loads(enrollment_resp)


## Accessing the Data

In [None]:
## Pull in Booklets Data on Assessment (with assessment_slug) for Analysis in this Notebook
## Carsten Nielsen helped me with pagination of the data payload from Crowdmark.
from urllib.request import urlopen, Request

page = 1
booklets = []

while True:
    print('Getting page ' + str(page) + '...')
    res = api_get('/assessments/' + assessment_slug + '/booklets?page[number]=' + str(page))
    booklets.extend(res['data'])
    if not res['links'].get('next'): break
    page += 1

print(len(booklets))

## Our data set has been pulled from bookletsURL and is now encoded as python dictionary in `booklets`.

## Exploration

Let's play around a bit to see what's in the data payload.

In [None]:
booklets[8]

In [None]:
booklets[8]['attributes']['total-points']

The 8th booklet record contains information related to the 8th booklet. For example, we know that the `enrollment` associated with the 8th booklet has a particular identification number.

***
***

# Front Pages

The next cell will generate a Markdown file with the frong page images from the assessment. The Markdown file can be converted to HTML. The embedded image links will expire a few hours after they were generated.

In [None]:
## Export links to front page images with associated booklet and student information (if available)
## Print as output to this cell. 
## Accumulate output in front_pages.md file

## Top Matter: Reminder of which assessment we are examining
g = open('front_pages.md', 'w')
print("assessment_slug:" + assessment_slug + "; front pages \n")
print('<img src="https://crowdmark-com.s3.amazonaws.com/website/images/visual-identity/crowdmark-logo-dark.png" alt="Crowdmark logo" width ="180" align="right">')
g.write("assessment_slug:" + assessment_slug + "; front pages \n")
for booklet in booklets:
    booklet_id = booklet['id']
    ## Information we know about the booklet we are examining.
    
    res = api_get('/booklets/' + booklet_id + '/pages')
    
    ## Set an exception to ignore empty booklets
    if res['data'] != []:
        g.write("--------------------------- \n")
        print("--------------------------- \n")
        g.write("### Booklet:" + booklet_id +"\n") 
        print("### Booklet:" + booklet_id +"\n") 
        ## Set an exception to ignore empty records
        if type(getbooklet(booklet_id)['data']['relationships']['enrollment']['data']) == dict: 
            enr = getbooklet(booklet_id)['data']['relationships']['enrollment']['data']['id']
            g.write("### Enrollment:" + enr +"\n")
            print("### Enrollment:" + enr +"\n")
            ## Exception handling for empty records
            if getenrollment(enr)['data']['attributes']['metadata'] != {}:
                student = getenrollment(enr)['data']['attributes']['metadata']
                g.write("### Student:" + student['Last Name'] + "," + student['First Name'] + "\n")
                print("### Student:" + student['Last Name'] + "," + student['First Name'] + "\n")
        # print(booklet)
        ## Note: the choice of res['data'][ZERO] points at the front page of the booklet.
        ## Variation: Change to res['data'][2] to extract third pages of all booklets.
        ## Print in Markdown format.
        g.write("![" + booklet_id + "](" + res['data'][0]['attributes']['url'] + ")\n")
        print(("![" + booklet_id + "](" + res['data'][0]['attributes']['url'] + ")\n"))
        ## Print as clickable thumbnails when rendered in HTML
        print('<a href="' + res['data'][0]['attributes']['url'] + '"><img src ="' 
                 + res['data'][0]['attributes']['url'] + '" width="400"></a>\n')
        g.write('<a href="' + res['data'][0]['attributes']['url'] + '"><img src ="' 
                 + res['data'][0]['attributes']['url'] + '" width="400"></a>\n')
                    
g.close()

## All Pages

The next cell will generate a Markdown file with the frong page images from the assessment. The Markdown file can be converted to HTML. The embedded image links will expire a few hours after they were generated.

In [None]:
## Export links to all page images with associated booklet and student information (if available)
## Print as output to this cell. 
## Accumulate output in front_pages.md file
## Experiments with this script suggest that the output may need to be paginated! It may crash the analysis server.

## Top Matter: Reminder of which assessment we are examining
g = open('all_pages.md', 'w')
print("assessment_slug:" + assessment_slug + "; front pages \n")
print('<img src="https://crowdmark-com.s3.amazonaws.com/website/images/visual-identity/crowdmark-logo-dark.png" alt="Crowdmark logo" width ="180" align="right">')
g.write("assessment_slug:" + assessment_slug + "; front pages \n")
for booklet in booklets:
    booklet_id = booklet['id']
    ## Information we know about the booklet we are examining.
    
    res = api_get('/booklets/' + booklet_id + '/pages')
    
    
    ## Set an exception to ignore empty booklets
    if res['data'] != []:
        g.write("--------------------------- \n")
        print("--------------------------- \n")
        g.write("### Booklet:" + booklet_id +"\n") 
        print("### Booklet:" + booklet_id +"\n") 
        ## Set an exception to ignore empty records
        if type(getbooklet(booklet_id)['data']['relationships']['enrollment']['data']) == dict: 
            enr = getbooklet(booklet_id)['data']['relationships']['enrollment']['data']['id']
            g.write("### Enrollment:" + enr +"\n")
            print("### Enrollment:" + enr +"\n")
            if getenrollment(enr)['data']['attributes']['metadata'] != {}:
                student = getenrollment(enr)['data']['attributes']['metadata']
                g.write("### Student:" + student['Last Name'] + "," + student['First Name'] + "\n")
                print("### Student:" + student['Last Name'] + "," + student['First Name'] + "\n")
        # print(booklet)
        print("<ul><br>\n")
        g.write("<ul>\n")
        for j in range(0,len(res['data'])):
        ## Loop through all pages in the booklet
            g.write("![" + booklet_id + "](" + res['data'][j]['attributes']['url'] + ")\n")
            ## Note: the choice of res['data'][ZERO] points at the front page of the booklet.
            ## Print in Markdown format.
            print(("![" + booklet_id + "](" + res['data'][j]['attributes']['url'] + ")\n"))
            ## Print as clickable thumbnails when rendered in HTML
            print('<a href="' + res['data'][j]['attributes']['url'] + '"><img src ="' 
                 + res['data'][j]['attributes']['url'] + '" width="200"></a>\n')
        print("</ul>\n")
        g.write("</ul><br>\n")
                    
g.close()