# Collecting responses via Nettskjema API
From Finn's github: https://github.com/finn42/PullingNettskjema/blob/main/PullingNettskjema.ipynb

The following notebook details how to use the Nettskjema API to retrieve responses collected via these webforms. This example combines informatiom from the existing documentation on this API's structure (https://utv.uio.no/docs/nettskjema/api/), code samples of the python library request, documentation of the r library nettskjemar, and some trial and error.

In order to access the Nettskjema API, you need:

    1) a token (string key) with editing rights to the form in questions
    2) a connection from a suitable IP address (such as on the UIO network)
    3) suitable commands for retrieving the data

## Getting access to the API
In order to access a form through the API, you need to generate an api account for your uio account, generate a unique token with suitable roles for that api account, and grant that api account editing privileges for the form(s) you want to access via the API. This is the link to set up and edit your API account within the nettskjema webinterface: https://nettskjema.no/user/api/index.html

First create an API account with a simple name and description, then click on that generated account and generate a token with suitable roles and IP restrictions. Tokens are strings that act as keys so the system knows who is logging into the API and that they have permission to do specific things. If you are only downloading responses via API (instead of setting up and editing forms), your token needs only the roles []READ_SUBMISSIONS and []READ_FORMS. If you leave the default IP address range, you can access the API from computers on the UiO network. This include machines logged into remotely (like through VMware Horizon).

When you generate/save the token, the next screen shows you the token string. **COPY AND PASTE THIS INTO A FILE RIGHT AWAY as you will never be able to retrieve it again.** (Though it is easy to just generate another token if needed.) This notebook reads a local directory file called 'nettskjema_token.txt' to get the token string.  

Once you have the token with suitable roles, click over to your form (nettskjema). Under the Permissions section in settings tab (Rettigheter) should be a list of nettskjema users who have editing rights on this form. So long as you are in one of those accounts, you can add your api account as: "*yourapiaccountname*@api". Note: There should not be spaces in the username when you are pasting it in to grant acess. You can check if the rights have been granted properly by going back to your api account details (under https://nettskjema.no/user/api/index.html) and making sure the form is listed in the Forms table.

Once the rights have been granted, we can use the token to access the API programatically. The API documentation gives examples of curl commands that can be run from the command like or terminal (https://utv.uio.no/docs/nettskjema/api/). Below are examples of performing the same tasks with python's request library. Also available is an r library developed by UiO research group LCBC to pull response data directly into r data formats, *nettskjemar*: https://lcbc-uio.github.io/nettskjemar/index.html

# Accessing the API via Request

In [13]:
import requests
import json
import zipfile
import os
import base64
import time
import shutil

Navigate to the directory storing your *nettskjema_token.txt* file and read the API token stored there.

In [2]:
# os.chdir('M:\\finnu\\kant\\div-ritmo-u1')
# os.chdir('M:\\research')
os.listdir()

['.ipynb_checkpoints',
 'CompressedData',
 'Concert_API',
 'InstOrdData',
 'nettskjema_token.txt',
 'PullingNettskjema.ipynb']

In [3]:
f=open('nettskjema_token.txt','r')
TOKEN = f.read()
f.close()

Set up a request session with the token information saved into the authentication settings. This allows us to skip spelling out the token string in this notebook with every API request sent.

In [4]:
session = requests.Session()
session.headers.update({'Authorization': 'Bearer ' + TOKEN})

Test the token by sending a request to see its expiration date.

In [5]:
# confirm the token is working on this IP with the check on expiry date
request_url = "https://nettskjema.no/api/v2/users/admin/tokens/expire-date"
response = session.get(request_url)
response.content

b'{"expireDate":"2024-01-31T12:08:10.000+0100"}'

 If the token is broken or wrong or too old, you will get an error message like: 

`b'{"statusCode":400,"message":"Not token authenticated","errors":null,"nestedErrors":null}'`

If the token is recognised, the output will be just the expiry date:

`b'{"expireDate":"2022-10-18T17:38:29.000+0200"}'`

Note: the 'b' before the response string indicates that the API reponse is transmited in bytes. The format is important to handle when trying to store the collected API response data. The standard function .decode() converts the byte string into something python interpretable. 

## Calling for Form metadata

If the token is working, next request information about the form you want to get data from. For this you need the formID number, a unique integer assigned by Nettskjema when the form was created. This is at the end of the form URL (ex: 225781 in https://nettskjema.no/a/225781). The page describing which forms your API account has access to also includes these ID numbers (https://nettskjema.no/user/api/index.html#/user)

The basic request to retreive metadata gives the description of who has access and editing rights, some history of the form, and the content of the form. The information returned to requests about forms are json files which can easily be converted into python dictionaries. 

It is possible to delete and edit forms through the API, but this isn't described here. The request url formats for these functions should be deducible from the curl commands listed in the API instructions at https://utv.uio.no/docs/nettskjema/api/

In [6]:
# example metadata API request with simple form of one question.

# equivalent curl command
#  $ curl 'https://nettskjema.no/api/v2/forms/225781' -i -X GET -H 'Authorization: Bearer TOKEN'

formID = 284191
request_url = 'https://nettskjema.no/api/v2/forms/' + str(formID)
response = session.get(request_url) # using the request session call which includes the saved API token
form_metadata = json.loads(response.content.decode()) # intepret recieved string into a python native datatype
form_metadata # show the information output

{'formId': 284191,
 'languageCode': 'en',
 'title': 'LIVELab Motion Data',
 'deliveryDestination': 'DATABASE',
 'formType': 'DEFAULT',
 'theme': 'DEFAULT',
 'createdBy': {'personId': 616293,
  'username': 'danasw@uio.no',
  'fullName': 'Dana Swarbrick',
  'name': 'Dana Swarbrick',
  'type': 'LOCAL'},
 'modifiedBy': {'personId': 616293,
  'username': 'danasw@uio.no',
  'fullName': 'Dana Swarbrick',
  'name': 'Dana Swarbrick',
  'type': 'LOCAL'},
 'createdDate': '2022-09-16T21:09:56.000+0200',
 'modifiedDate': '2023-01-31T12:12:47.000+0100',
 'respondentGroup': 'ALL',
 'editorsContactEmail': 'danasw@uio.no',
 'editorsSubmissionEmailType': 'NONE',
 'editors': [{'personId': 1927165,
   'username': 'danasw@api',
   'fullName': 'RITMO',
   'name': 'RITMO',
   'type': 'API'},
  {'personId': 616293,
   'username': 'danasw@uio.no',
   'fullName': 'Dana Swarbrick',
   'name': 'Dana Swarbrick',
   'type': 'LOCAL'}],
 'collectsPersonalData': True,
 'maxSubmissionsPerson': 1,
 'retainRespondentAcce

If you do not have access rights to a form, or if you are trying to access the API from an IP address that isn't in the range specified by your token, you get API errors instead, like:

 `{'statusCode': 403,
 'message': 'No access to form with id 225782.',
 'errors': None,
 'nestedErrors': None}`
 
 `{'statusCode': 404,
 'message': 'Could not find form with id 22578.',
 'errors': None,
 'nestedErrors': None}`
 
 

The addition of '/submissions' to the request URL calls instead for metadata on the responses, called "submissions" by the API. The returned json file is here converted into a list of dictionaries with standard information about each submission (submission ID, created and modified dates, etc.) as well as all the form responses. 

Responses are returned in reverse chronological order: the last response is the first submission in the list. 

In [7]:
formID = 284191
request_url = 'https://nettskjema.no/api/v2/forms/' + str(formID) + '/submissions' 
response = session.get(request_url) # using the request session call which includes the saved API token
sub_metadata = json.loads(response.content.decode()) # intepret received string into a python native datatype

In [8]:
# examples of the last two responses received
print('Most recent submissions: '+str(sub_metadata[:2]))

Most recent submissions: [{'submissionId': 25559872, 'createdDate': '2023-01-31T06:16:01.000+0100', 'modifiedDate': '2023-01-31T06:16:01.000+0100', 'delivered': True, 'answerTime': 0, 'answers': [{'answerId': 144609621, 'questionId': 4870947, 'textAnswer': 'fceec0b7-268e-13aa-f809-b64a3f98c0b5'}, {'answerId': 144609622, 'questionId': 4870950, 'textAnswer': 'data.zip', 'attachments': [{'answerAttachmentId': 482235, 'fileName': 'data.zip', 'mediaType': 'application/zip', 'size': 65663}]}]}, {'submissionId': 25558280, 'createdDate': '2023-01-30T22:59:01.000+0100', 'modifiedDate': '2023-01-30T22:59:01.000+0100', 'delivered': True, 'answerTime': 0, 'answers': [{'answerId': 144605537, 'questionId': 4870947, 'textAnswer': '17cf4405-791f-088b-39ff-b816f9eab2ce'}, {'answerId': 144605536, 'questionId': 4870950, 'textAnswer': 'data.zip', 'attachments': [{'answerAttachmentId': 482189, 'fileName': 'data.zip', 'mediaType': 'application/zip', 'size': 1129}]}]}]


In [9]:
# the first two submissions received
print('Earliest submissions: ' +str(sub_metadata[-2:])) 

Earliest submissions: [{'submissionId': 23063569, 'createdDate': '2022-09-19T17:48:08.000+0200', 'modifiedDate': '2022-09-19T17:48:08.000+0200', 'delivered': True, 'answerTime': 0, 'answers': [{'answerId': 132329985, 'questionId': 4870947, 'textAnswer': '78b0c8ef-dc6a-ce47-eb2d-987ff1dfa003'}, {'answerId': 132329984, 'questionId': 4870950, 'textAnswer': 'data.zip', 'attachments': [{'answerAttachmentId': 419946, 'fileName': 'data.zip', 'mediaType': 'application/zip', 'size': 62101}]}]}, {'submissionId': 23063525, 'createdDate': '2022-09-19T17:45:26.000+0200', 'modifiedDate': '2022-09-19T17:45:26.000+0200', 'delivered': True, 'answerTime': 0, 'answers': [{'answerId': 132329810, 'questionId': 4870947, 'textAnswer': '75e6dffb-6431-4679-77d4-f7dba9f93445'}, {'answerId': 132329811, 'questionId': 4870950, 'textAnswer': 'data.zip', 'attachments': [{'answerAttachmentId': 419944, 'fileName': 'data.zip', 'mediaType': 'application/zip', 'size': 14514}]}]}]


It is possible to request subsets of responses, specifically all responses after either a specific submission date or submission ID. Submission IDs increase monotonically, assigned uniquely across all of Nettskjema.no. A conveninent trick when downloading responses from an active survey is to call for only those submissions recieved since the last time the data was downloaded. To call only subsets, the request url gets extended with "&fromDate=" or "&fromSubmissionID=" with the appropriately formated threshold. 

It is also possible to download only the submission ID field, instead of the full submission details with the addition of "?fields=submissionId". At this time, no other fields can be isolated in this way.

In [37]:
# how to call submissions from after a certain date

# curl command template
# $ curl 'https://nettskjema.no/api/v2/forms/8432376/submissions?fields=submissionId&fromDate=2021-01-11T13%3A43%3A17.486%2B0100' -i -X GET -H 'Authorization: Bearer TOKEN'

#ML Cop: 2021-10-25T08:27:22+01:00 # remembrer URL encoding? : is %3A , + is %2B, - is %2D https://www.w3schools.com/tags/ref_urlencode.ASP
#date = '2021-10-25T08%3A27%3A22.000%2B0100'

formID = 284191
# Audience Experience Study: September 23, 2022 7:30pm. The concert started slightly late so we can safely assume that this captures meaningful motion data
# 2022-09-23T19:30:00-04:00 # -4 because summer time is UTC-4h with daylight savings time/summer time
# 2022-09-23T19%3A30%3A00.000%2D0400
date = '2022-09-23T19%3A30%3A00.000%2D0400'

request_url = 'https://nettskjema.no/api/v2/forms/' + str(formID) + '/submissions?fields=submissionId&fromDate=' + date

response = session.get(request_url)
subIDs = json.loads(response.content.decode())

print('Submissions since date ' + date + ': ' + str(len(subIDs)))
if len(subIDs)<5:
    print(subIDs)
else:
    print('Most recent submissions: '+ str(subIDs[:2])) # Note: this shows the last 2 submissions, not the first two submissions
    print('Earliest submissions after concert start: ' + str(subIDs[-2:])) # Note: these are the first two submissions from concert start. 

Submissions since date 2022-09-23T19%3A30%3A00.000%2D0400: 10572
Most recent submissions: [{'submissionId': 25559872}, {'submissionId': 25558280}]
Earliest submissions after concert start: [{'submissionId': 23172340}, {'submissionId': 23172339}]


Note: Forms that do not collect personal information on Nettskjema do not retain dates in a format that can be used for this kind of range restriction. You will get this error when trying to call a subset of responses by date: 

`{'statusCode': 409, 'message': 'Since the form does not collect personal data, the submissions will not have dates to compare with the fromDate parameter', 'errors': None, 'nestedErrors': None}`

In this case, it is necessary to find a suitable submissionID that corresponds to the same temporal threshold. If you are monitoring an active survey, use the ID of the first `[0]` submissionID from your last API call. 

In [12]:
# how to call submissions from after a certain ID, restricting metadata to submission ID
# curl command template"
# $ curl 'https://nettskjema.no/api/v2/forms/225781/submissions?fields=submissionId&fromSubmissionId=16653694' -i -X GET \
#   -H 'Authorization: Bearer TOKEN'

formID = 284191
submissionID = 23172339 # Earliest submission after concert start
request_url = 'https://nettskjema.no/api/v2/forms/' + str(formID) + '/submissions?fields=submissionId&fromSubmissionId=' + str(submissionID)
response = session.get(request_url)
subIDs = json.loads(response.content.decode())
print('Submissions since subID ' + str(submissionID) + ': ' + str(len(subIDs)))
if len(subIDs)<5: # print error message or top responses 
    print(subIDs)
else:
    print(subIDs[0])


Submissions since subID 23172339: 10571
{'submissionId': 25559872}


# Gathering Musiclab phone sensor data
The above commands cover access to forms that collect information strickly through the webform interface. Apps like Musiclab also gather information in different shapes that are stored by nettskjema as attachments to submissions (responses). These are a bit trickier to retrieve, but still accessible through the API. 

Note: the following cells will not run without permissions for the MusicLab form on Nettskjema, but the shape should be the same for any forms that collects attachments with submissions.

Notice that the difference between the chunk above and below is just the fields=submissionId section in the request url. Above only returns the submission ID and below returns the full submission.

In [38]:
# get metadata on submissions for the music lab app after a certain submission ID

formID = 284191
submissionID = 23172339 # collect responses from after this submission ID. 
request_url = 'https://nettskjema.no/api/v2/forms/' + str(formID) + '/submissions?fromSubmissionId=' + str(submissionID)
response = session.get(request_url)
subIDs = json.loads(response.content.decode())
print('Submissions since subID ' + str(submissionID) + ': ' + str(len(subIDs)))
if len(subIDs)<5: # print error message or top responses 
    print(subIDs)
else:
    print(subIDs[0]) # print the most recent submission
    print(subIDs[-1]) # print the earliest submission after the concert start (after response 23172339 which was collected '2022-09-23T19%3A30%3A00.000%2D0400' 


Submissions since subID 23172339: 10571
{'submissionId': 25559872, 'createdDate': '2023-01-31T06:16:01.000+0100', 'modifiedDate': '2023-01-31T06:16:01.000+0100', 'delivered': True, 'answerTime': 0, 'answers': [{'answerId': 144609622, 'questionId': 4870950, 'textAnswer': 'data.zip', 'attachments': [{'answerAttachmentId': 482235, 'fileName': 'data.zip', 'mediaType': 'application/zip', 'size': 65663}]}, {'answerId': 144609621, 'questionId': 4870947, 'textAnswer': 'fceec0b7-268e-13aa-f809-b64a3f98c0b5'}]}
{'submissionId': 23172340, 'createdDate': '2022-09-24T01:30:02.000+0200', 'modifiedDate': '2022-09-24T01:30:02.000+0200', 'delivered': True, 'answerTime': 0, 'answers': [{'answerId': 132652669, 'questionId': 4870950, 'textAnswer': 'data.zip', 'attachments': [{'answerAttachmentId': 422499, 'fileName': 'data.zip', 'mediaType': 'application/zip', 'size': 295698}]}, {'answerId': 132652670, 'questionId': 4870947, 'textAnswer': '0eceb1a9-9ba3-a39b-2e8e-f32a5887fa40'}]}


In order to retrieve the sensor data stored in the submission attachment, we have to call for each file individually and save it appropriately. Here are the essential details from one submission out of the metadata called above. 

In [14]:
subn = 0
print('submissionID : ' + str(subIDs[subn]['submissionId']))
for ans in subIDs[subn]['answers']:
    if 'textAnswer' in ans:
        if len(ans['textAnswer'])>12:
            print('Submitting installation: ' + ans['textAnswer'])
    if 'attachments' in ans:
        print(ans['attachments'])

submissionID : 25559872
Submitting installation: fceec0b7-268e-13aa-f809-b64a3f98c0b5
[{'answerAttachmentId': 482235, 'fileName': 'data.zip', 'mediaType': 'application/zip', 'size': 65663}]


so to call the attachment for that submission we use the request:

In [15]:
subn = 0 # just calling one as an example

subID = str(subIDs[subn]['submissionId'])
for ans in subIDs[subn]['answers']:
    if 'attachments' in ans:
        attID = str(ans['attachments'][0]['answerAttachmentId'])
request_url = 'https://nettskjema.no/api/v2/submissions/' + subID + '/attachments/' + attID
response = session.get(request_url)

att_dets = json.loads(response.content.decode())
print('fileName: ' + att_dets['fileName'])
print('fileSize: ' + str(att_dets['fileSize']))
print('mediaType: ' + att_dets['mediaType'])
print('content: ' + att_dets['content'][:500] + '...')

fileName: data.zip
fileSize: 65663
mediaType: application/zip
content: UEsDBAoAAAAAANQpP1aVqMo0lf8AAJX/AABEAAAAMjAyMy0wMS0zMVQwNS0xNC00MC4yMDdaX2ZjZWVjMGI3LTI2OGUtMTNhYS1mODA5LWI2NGEzZjk4YzBiNV9kbS5jc3Z0aW1lc3RhbXAsdGltZSx4LHkseixhbHBoYSxiZXRhLGdhbW1hDQoiMjAyMy0wMS0zMVQwNToxNDoyOC44MjlaIiwyNDg1My4yOTk5OTk5ODIxMiwtMS44LDkuMjAwMDAwMDAwMDAwMDAxLDIuOTAwMDAwMDAwMDAwMDAwNCwtMTUuNzAwMDAwMDAwMDAwMDAxLC0xMy4xMDAwMDAwMDAwMDAwMDEsLTQuNA0KIjIwMjMtMDEtMzFUMDU6MTQ6MjguODQ1WiIsMjQ4NzAsLTEuOCw5LjIwMDAwMDAwMDAwMDAwMSwyLjkwMDAwMDAwMDAwMDAwMDQsLTEzLjkwMDAwMDAwMDAwMDAwMiwtMTQsLTMuNjAw...


The nettskjema API returns attachments as 64encoded zipfiles in byte strings. To make these readable, we need to decode then save the string as a zip file and then unzip them. Thankfully there are python libraries for this.  

First be sure you are in a suitable local folder, then unpack the attachment.

In [16]:
os.mkdir('./Test_API')
os.chdir('Test_API')

Think of "sub" as submission (not subject) in the code below.

In [17]:
subn = 0 # just calling the most recent one as an example

subID = str(subIDs[subn]['submissionId'])
for ans in subIDs[subn]['answers']:
    if 'attachments' in ans:
        attID = str(ans['attachments'][0]['answerAttachmentId'])
request_url = 'https://nettskjema.no/api/v2/submissions/' + subID + '/attachments/' + attID
response = session.get(request_url)

att_dets = json.loads(response.content.decode())

# write the decoded attachment into a zip file
f=open('data.zip', 'wb')
f.write(base64.b64decode(att_dets['content']))
f.close()

# and then unzip that file, leaving a uniquely titled csv, I hope
with zipfile.ZipFile('data.zip', 'r') as zip_ref:
    if not os.path.exists(str(subID)):
        os.mkdir(str(subID))
        zip_ref.extractall('./'+str(subID)) # Not unique filenames so use the unique submission IDs 

print(os.listdir())
os.chdir('./'+str(subID))
print(os.listdir())
os.chdir('..')

['25559872', 'data.zip']
['2023-01-31T05-14-40.207Z_fceec0b7-268e-13aa-f809-b64a3f98c0b5_dm.csv']


Here we have a minute recording from a device with the unique installation ID 'fceec0b7-268e...' in a format that is easy to read. 

The files within the zip are named for the device and information type, but do not include the submission number. They do however contain a datetime string in ISO format which means they are extremely unlikely to be overwritten. To avoid this, the files are unzipped within a folder names for that unique submission. 

Now to collect many at once: 

In [18]:
# pull in attachment files and unpack them

checkedSubs = 25

newSubs = 0 # count the submission sampled
tic = time.time()
for submis in subIDs[:checkedSubs]: # just getting 25 as a test
    # first find out the attachment file ID for this submission
    subID = str(submis['submissionId'])
    # if there is an IDed attachment for this submission, get the file
    for subm in submis['answers']:
        if len(subm)>3: # cheat to pick out only submissions with attachments. might fail.
            attID = str(subm['attachments'][0]['answerAttachmentId'])
            request_url = 'https://nettskjema.no/api/v2/submissions/' + subID + '/attachments/' + attID
            response = session.get(request_url)
            newSubs += 1
            att_dets = json.loads(response.content.decode())
            # write the decoded attachment into a zip file
            f=open('data.zip', 'wb')
            f.write(base64.b64decode(att_dets['content']))
            f.close()
            # and then unzip that file, leaving a uniquely titled csv, I hope
            with zipfile.ZipFile('data.zip', 'r') as zip_ref:
                if not os.path.exists(str(subID)):
                    os.mkdir(str(subID))
                    zip_ref.extractall('./'+str(subID)) # if not unique can use the unique submission IDs 

print('time to collect ' + str(newSubs) + ' attachments: ' + str(time.time() - tic))

time to collect 25 attachments: 2.039228916168213


The files can then be crawled for with suitable information about which submissions related to which installations, i.e., what can be sewn together in order. 

## compressing submission files
For some forms of storage and retrieval of unzipped data, this folder-per-submission arrangement is really awkward. The following shows two reorganisation schemes. The first moves the files from a long list of folders to a single folder while adding the submission number to filenames to preserve uniqueness. The second organises the files into unique installation folders.

In [59]:
os.chdir('..')
#os.mkdir('CompressedData')

In [21]:
folders = os.listdir('Test_API')
folders 

['25490506',
 '25490507',
 '25490508',
 '25490509',
 '25490512',
 '25490513',
 '25490514',
 '25490516',
 '25490517',
 '25490518',
 '25490520',
 '25490521',
 '25503764',
 '25507775',
 '25516198',
 '25516199',
 '25516200',
 '25516201',
 '25517587',
 '25541487',
 '25541490',
 '25541503',
 '25542763',
 '25558280',
 '25559872',
 'data.zip']

In [23]:
# to put all the files inside the CompressedData folder for fastest sftp transfer (SSH file transfer protocol / secure ftp) 
for subid in folders:
    if subid.startswith('2'):
        filenames = os.listdir('./Test_API/'+str(subid))
        #print(filenames)
        for fn in filenames:
            if fn.endswith('.csv'):
                sourcefile = './Test_API/'+str(subid) + '/' + fn
                targetfile = './CompressedData/'+str(subid) + '.' + fn
                #print(targetfile)
                shutil.copy2(sourcefile,targetfile)
                
os.listdir('./CompressedData/')

['25490506.2023-01-26T21-47-05.780Z_29692813-c50d-22a3-c078-e9b165eed306_dm.csv',
 '25490506.2023-01-26T21-47-05.780Z_29692813-c50d-22a3-c078-e9b165eed306_gl.csv',
 '25490507.2023-01-26T21-47-08.306Z_29692813-c50d-22a3-c078-e9b165eed306_dm.csv',
 '25490507.2023-01-26T21-47-08.306Z_29692813-c50d-22a3-c078-e9b165eed306_gl.csv',
 '25490508.2023-01-26T21-47-11.221Z_29692813-c50d-22a3-c078-e9b165eed306_dm.csv',
 '25490508.2023-01-26T21-47-11.221Z_29692813-c50d-22a3-c078-e9b165eed306_gl.csv',
 '25490509.2023-01-26T21-47-15.351Z_29692813-c50d-22a3-c078-e9b165eed306_dm.csv',
 '25490509.2023-01-26T21-47-15.351Z_29692813-c50d-22a3-c078-e9b165eed306_gl.csv',
 '25490512.2023-01-26T21-47-21.607Z_29692813-c50d-22a3-c078-e9b165eed306_dm.csv',
 '25490512.2023-01-26T21-47-21.607Z_29692813-c50d-22a3-c078-e9b165eed306_gl.csv',
 '25490513.2023-01-26T21-47-26.176Z_29692813-c50d-22a3-c078-e9b165eed306_dm.csv',
 '25490513.2023-01-26T21-47-26.176Z_29692813-c50d-22a3-c078-e9b165eed306_gl.csv',
 '25490514.2023-

To organise the files into folders per installation, we monitor and generate new folders as needed.

In [30]:
os.mkdir('InstOrdData') 


In [31]:
# to put all the files inside the Installation Ordered Data folder for fastest sftp transfer
foldlist = os.listdir('./InstOrdData/')

for subid in folders:
    if subid.startswith('2'):
        filenames = os.listdir('./Test_API/'+str(subid))
        #print(filenames)
        for fn in filenames:
            if fn.endswith('.csv'):
                # extract the installation ID 
                #subdets = fn.split('.') # worked before but now with the time, this was splitting at the ms level and the ms was being included with the installation ID
                subdets = fn.split('_') # splits it into date, installation, type (dm.csv or gl.csv)
                instid = subdets[1] # changed from Finn's because Pedro includes the date and time in the submission name!
                # if the device doesn't have a folder, generate one
                if instid not in foldlist:
                    os.mkdir('./InstOrdData/' + instid)
                    foldlist = os.listdir('./InstOrdData/')
                    
                sourcefile = './Test_API/' + str(subid) + '/' + fn
                targetfile = './InstOrdData/' + instid + '/'+str(subid) + '.' + fn
                #print(targetfile)
                shutil.copy2(sourcefile,targetfile)
                
os.listdir('./InstOrdData/')

['0ca55e4f-7f76-08b0-7aaa-e56d70981de4',
 '17cf4405-791f-088b-39ff-b816f9eab2ce',
 '29692813-c50d-22a3-c078-e9b165eed306',
 '4e53851d-962c-55a9-36eb-388f5cb8a0fa',
 '6f4f3465-d7fe-ae8b-717a-31db209e2b51',
 'a991cbae-01d5-de82-7eaf-1bda8c6a08d4',
 'ec41dd5d-5162-f6ad-8622-10c936895910',
 'fa0c30a6-f531-5eab-d1c3-b68407d76d42',
 'fceec0b7-268e-13aa-f809-b64a3f98c0b5']

Now, if we need to use sftp to move the data, we don't need to crawl through thousands of folders to find it. 

For the LIVELab Concert project, we only want data collected in close temporal proximit to September 23rd 7:30-9:30 p.m. Luckily date is stored in the file name in UTC. createdDate is also storing the time stamp in UTC+2 hence the +0200 at the end (summer time in Oslo).

Therefore I need to go through the files and only retrieve the attachments if it contains that date at the beginning of the filename. The concert began at 2022-09-23T19:30:00.000-0400 = 2022-09-24T01:30:00.000+0200

In the subIDs list, the first submission is 23172339 with 'createdDate': '2022-09-24T01:30:02.000+0200'

The last submission on September 24th was 23187860 with created date: 2022-09-24T21:22:18.000+0200
However that is too late for the concert end therefore, the last reasonable time is perhaps 2 hours after concert start and would then be around '2022-09-24T03:30:02.000+0200'

This submission corresponds to ID 23181108.

Therefore the range in IDs is probably 23172339-23181108.


In [37]:
formID = 284191
# Audience Experience Study: September 23, 2022 7:30pm. The concert started slightly late so we can safely assume that this captures meaningful motion data
# 2022-09-23T19:30:00-04:00 # -4 because summer time is UTC-4h with daylight savings time/summer time
# 2022-09-23T19%3A30%3A00.000%2D0400
date = '2022-09-23T19%3A30%3A00.000%2D0400'

request_url = 'https://nettskjema.no/api/v2/forms/' + str(formID) + '/submissions?fromDate=' + date

response = session.get(request_url)
subIDs = json.loads(response.content.decode())

print('Submissions since date ' + date + ': ' + str(len(subIDs)))
if len(subIDs)<5:
    print(subIDs)
else:
    print('Most recent submissions: '+ str(subIDs[:2])) # Note: this shows the last 2 submissions, not the first two submissions
    print('Earliest submissions after concert start: ' + str(subIDs[-2:])) # Note: these are the first two submissions from concert start. 

Submissions since date 2022-09-23T19%3A30%3A00.000%2D0400: 10572
Most recent submissions: [{'submissionId': 25559872}, {'submissionId': 25558280}]
Earliest submissions after concert start: [{'submissionId': 23172340}, {'submissionId': 23172339}]


In [38]:
# get metadata on submissions for the music lab app after a certain submission ID

formID = 284191
submissionID = 23172339 # collect responses from after this submission ID. 
request_url = 'https://nettskjema.no/api/v2/forms/' + str(formID) + '/submissions?fromSubmissionId=' + str(submissionID)
response = session.get(request_url)
subIDs = json.loads(response.content.decode())
print('Submissions since subID ' + str(submissionID) + ': ' + str(len(subIDs)))
if len(subIDs)<5: # print error message or top responses 
    print(subIDs)
else:
    print(subIDs[0]) # print the most recent submission
    print(subIDs[-1]) # print the earliest submission after the concert start (after response 23172339 which was collected '2022-09-23T19%3A30%3A00.000%2D0400' 


Submissions since subID 23172339: 10571
{'submissionId': 25559872, 'createdDate': '2023-01-31T06:16:01.000+0100', 'modifiedDate': '2023-01-31T06:16:01.000+0100', 'delivered': True, 'answerTime': 0, 'answers': [{'answerId': 144609622, 'questionId': 4870950, 'textAnswer': 'data.zip', 'attachments': [{'answerAttachmentId': 482235, 'fileName': 'data.zip', 'mediaType': 'application/zip', 'size': 65663}]}, {'answerId': 144609621, 'questionId': 4870947, 'textAnswer': 'fceec0b7-268e-13aa-f809-b64a3f98c0b5'}]}
{'submissionId': 23172340, 'createdDate': '2022-09-24T01:30:02.000+0200', 'modifiedDate': '2022-09-24T01:30:02.000+0200', 'delivered': True, 'answerTime': 0, 'answers': [{'answerId': 132652669, 'questionId': 4870950, 'textAnswer': 'data.zip', 'attachments': [{'answerAttachmentId': 422499, 'fileName': 'data.zip', 'mediaType': 'application/zip', 'size': 295698}]}, {'answerId': 132652670, 'questionId': 4870947, 'textAnswer': '0eceb1a9-9ba3-a39b-2e8e-f32a5887fa40'}]}


In [61]:
os.chdir('..') # back t oparent (i.e. LIVElab folder)
os.chdir('Concert_API')

In [62]:
# pull in attachment files from the date of the concert and unpack them

newSubs = 0 # count the submission sampled
tic = time.time()
for submis in subIDs: 
    # first find out the attachment file ID for this submission
    # this will create a 3-hour time range for the motion, however it will include 30-mins after the concert start and ~1.5 hours after concert end. Future analysis should work to only select the real data.
    if submis['createdDate'].startswith('2022-09-24T01') or submis['createdDate'].startswith('2022-09-24T02') or submis['createdDate'].startswith('2022-09-24T03'):
        subID = str(submis['submissionId'])
        # if there is an IDed attachment for this submission, get the file
        for subm in submis['answers']:
            if len(subm)>3: # cheat to pick out only submissions with attachments. might fail.
                attID = str(subm['attachments'][0]['answerAttachmentId'])
                request_url = 'https://nettskjema.no/api/v2/submissions/' + subID + '/attachments/' + attID
                response = session.get(request_url)
                newSubs += 1
                att_dets = json.loads(response.content.decode())
                # write the decoded attachment into a zip file
                f=open('data.zip', 'wb')
                f.write(base64.b64decode(att_dets['content']))
                f.close()
                # and then unzip that file, leaving a uniquely titled csv, I hope
                with zipfile.ZipFile('data.zip', 'r') as zip_ref:
                    if not os.path.exists(str(subID)):
                        os.mkdir(str(subID))
                        zip_ref.extractall('./'+str(subID)) # if not unique can use the unique submission IDs 

print('time to collect ' + str(newSubs) + ' attachments: ' + str(time.time() - tic))

time to collect 8236 attachments: 736.8054723739624


When I used a date range of 'if submis['createdDate'].startswith('2022-09-23') or submis['createdDate'].startswith('2022-09-24'): '
I got:

time to collect 8239 attachments: 765.3608708381653

When I used the 3-hour time range, I got: if submis['createdDate'].startswith('2022-09-24T01') or submis['createdDate'].startswith('2022-09-24T02') or submis['createdDate'].startswith('2022-09-24T03'):

time to collect 8236 attachments: 736.8054723739624

In [10]:
os.chdir('..') # back t oparent (i.e. LIVElab folder)
os.mkdir('Concert_InstOrdData') 


In [11]:
folders = os.listdir('Concert_API')

In [14]:
# to put all the files inside the Installation Ordered Data folder for fastest sftp transfer
foldlist = os.listdir('./Concert_InstOrdData/')

for subid in folders:
    if subid.startswith('23'):
        filenames = os.listdir('./Concert_API/'+str(subid))
        #print(filenames)
        for fn in filenames:
            if fn.endswith('.csv'):
                # extract the installation ID 
                #subdets = fn.split('.') # worked before but now with the time, this was splitting at the ms level and the ms was being included with the installation ID
                subdets = fn.split('_') # splits it into date, installation, type (dm.csv or gl.csv)
                instid = subdets[1] # changed from Finn's because Pedro includes the date and time in the submission name.
                # if the device doesn't have a folder, generate one
                if instid not in foldlist:
                    os.mkdir('./Concert_InstOrdData/' + instid)
                    foldlist = os.listdir('./Concert_InstOrdData/')
                    
                sourcefile = './Concert_API/' + str(subid) + '/' + fn
                targetfile = './Concert_InstOrdData/' + instid + '/'+str(subid) + '.' + fn
                #print(targetfile)
                shutil.copy2(sourcefile,targetfile)
                
#os.listdir('./InstOrdData/')

There are 139 unique installation IDs from that 3 hour time range. This should include both the live and livestreaming audience.