# NDA Data Staging Web Services

The following slides will describe available web services for programatically uploading and staging data that will subsequently be validated and submitted.

## Use Case

These services are being provided to support real-time transactions where the user wishes to upload data as it is received/processed.  These transactions can be aggregated together, validated, and submitted at a later time or according to a schedule (i.e., weekly data submissions).

## Authenicating with Web Services

Currently NDA web services implement basic authentication over HTTPS secure connections.  Each web service request that requires authentication will need to include the Authorization header.

*
**NOTE: Supported authentication methods may change in the future**
*

In [1]:

# Record NDA Username / Password for services where authentication is required.

import getpass
username = input("Enter your NDAR username: ")
password = getpass.getpass("Enter you NDAR password: ")

Enter your NDAR username: obenshaindw
Enter you NDAR password: ········


## Federated Token Service

This service provides temporary session tokens that identify the NDA user within Amazon Web Services, assinging user certain permissions for working with S3 Data Objects.

Please refer to https://github.com/NDAR/nda_aws_token_generator for examples on how to use the the service and to obtain the Python package used in the following example.

In [2]:
# nda_aws_toekn_generator can be downloaded from https://github.com/NDAR/nda_aws_token_generator/tree/master/python
from nda_aws_token_generator import *

# URL for DataManager service
url = 'https://ndar.nih.gov/DataManager/dataManager'

# Token generator object
generator = NDATokenGenerator(url)

# Generate token
token = generator.generate_token(username, password)
print('aws_access_key_id=%s\n\n'
      'aws_secret_access_key=%s\n\n'
      'security_token=%s\n\n'
      'expiration=%s\n\n' 
      %(token.access_key,
        token.secret_key,
        token.session,
        token.expiration)
      )

aws_access_key_id=ASIAI3KBUYHRL5F5BHFQ

aws_secret_access_key=X6JKtnTOFvAQ4ZABOsH93ne/fsJQiaZP15tYvgM1

security_token=FQoDYXdzED0aDDW72FXA0rOxgifdCyLYAZvvIyWyjX1ms61mNuEMaNUN2YjLA9FK5sF9DusSe1HZEpw4rBIzNkEJeFvg9i7chVMAXQgT43UtMkXtEoH66Pd7IusOMTy6ho57jy0AFfBzfxgrTYj0vVZBiBAmpYXsznAclgaW5fyXJNh+e6xixR0xPyL19+GNwsmfvZ4rn5IlpqCknaUmUdDaGvvUr/yEMYqcagNotAr7NIxvsv1m0247l3vaj8nMRQhShiw3tOh3+tznrGZrVfwf33zoKIC4yd5fs1gHtLulkLCbpTqsBfJdUX4Gk29Oxyi3rMfKBQ==

expiration=2017-06-27T11:51:19-04:00




## Working with AWS S3 Object storage

Your session token identifies your NDA user within the AWS API.  By default your user will be granted access to Get objects from datatsets that have been shared within the NDA, all other objects are not accessible.

Your user will also be granted access to a named location within specific buckets, based on your NDA username, where your user has been granted full permissions to:
* list
* upload
* get

List and delete permission for users has not been granted.

In [3]:
# Your AWS API username will be {NDA AWS account number}:{NDA username}
aws_account = '618523879050'
aws_user = '%s:%s'%(aws_account,username)
print('Within specific buckets, you will have full access to the following location: %s/scratch'%(aws_user))

# User-Specific S3 storage
useable_location = '%s/scratch'%(aws_user)

Within specific buckets, you will have full access to the following location: 618523879050:obenshaindw/scratch


In [4]:
# Create a connection object for the S3 API using the previously generated token 

import boto3
from boto3.s3.transfer import S3Transfer
import botocore
s3 = boto3.session.Session(aws_access_key_id=token.access_key, aws_secret_access_key=token.secret_key, aws_session_token=token.secret_key)
s3_client = s3.client('s3')
s3transfer = S3Transfer(s3_client)

In [5]:
bucket_name = input("Enter the name of the bucket you wish to access: ").strip()
key_name = input("Enter the name of the key you wish to access: ").strip()
response = s3_client.head_object(Bucket=bucket_name, Key=key)
print (response)

Enter the name of the bucket you wish to access: NDA_AURORA


In [6]:
# List Objects in the bucket that are accessible to my user.

for key in bucket.list(prefix=useable_location):
    print('\nObject:\n\tName: %s\n\tSize: %s'%(key.name, key.size))

In [7]:
import os
from boto.s3.key import Key
import time, datetime

# Create a Timestamp
ts = time.time()
st = datetime.datetime.fromtimestamp(ts).strftime('%Y-%m-%d %H:%M:%S')
print('Time is: \n%s'%(st))

# Uploading an Object

## Create a new AWS S3 Object within the bucket

new_object = Key(bucket)

## Give the key a name
name = input('What is the name of your new file: ').strip()
new_object.key = '%s/%s'%(useable_location, name)

## Create a file with text (timestamp)
open(name, 'a').write(st)

## Upload it
new_object.set_contents_from_filename(name)

Time is: 
2017-06-26 23:52:04
What is the name of your new file: NOTREAL-file-26June2017.csv


19

In [8]:
# List Objects in the bucket that are accessible to my user.

for key in bucket.list(prefix=useable_location):
    print('\nObject:\n\tName: %s\n\tSize: %s'%(key.name, key.size))


Object:
	Name: 618523879050:obenshaindw/scratch/NDARYR448VRJ-June-2017.csv
	Size: 38

Object:
	Name: 618523879050:obenshaindw/scratch/NOTREAL-file-26June2017.csv
	Size: 19

Object:
	Name: 618523879050:obenshaindw/scratch/new_file_for_upload.txt
	Size: 19

Object:
	Name: 618523879050:obenshaindw/scratch/test-file.zip
	Size: 0


In [9]:
# Getting an Object

## We will access the object we just created

contents = new_object.get_contents_as_string()
print('File Contents:\n %s'%(contents))

## Download the contents to a file
downloaded_file = input('Save file as: ').strip()
new_object.get_contents_to_filename(downloaded_file)

File Contents:
 b'2017-06-26 23:52:04'
Save file as: NOTREAL-downloaded.csv


## miNDAR Import Service

A miNDAR is short for mini-NDAR, which is a remote database (Oracle) that you have control over and can push data to.  A miNDAR can be created either by creating a data and choosing the miNDAR option, or by requesting a miNDAR through the Help Desk: ndahelp@mail.nih.gov.  Each of the miNDAR tables typically correspond to one of the data dictionaires, which can be found here: https://ndar.nih.gov/data_dictionary.html.  Data structures from the dictionary are also accessible through a restFul web service https://ndar.nih.gov/api/datadictionary/v2/

The miNDAR webservice currently provides the capability to POST data throu RESTful web service to a remote miNDAR. This service requires authentication to ensure you are the miNDAR owner.

Swagger User Interface - https://ndar.nih.gov/api/mindar


## Data Formats

The miNDAR import operation accepts data in either XML or JSON format, an XML Schema Definition is provided.

**NOTE:** *Format for Date types actually requires an Oracle timestamp in the form of MM/DD/YYYY 24H:MM:SS*

[submission.xsd](files/submission.xsd)

In addition to having data conform this schema definition, data are also checked against the data dictionary for correct data type, and valid length (i.e., submitted data does not exceed the maximum size for the column).

Sample submission files are provided for one of the data structures in the NDA dictionary.

[aurora01.json](files/aurora01.json)

[aurora01.xml](files/aurora01.xml)

Finally, code is provided for generating sample XML/JSON messages for a given data structure.  Note that values used to generate these files are placeholder values and would need to be substituted with actual values intended for submission.

In [10]:
import requests
import json

class submission:
    
    def __init__(self, mindar_schema):
        self.schema = mindar_schema
        self.message = dict({'dataStructureRows': [], 'schemaName': mindar_schema})
        
    def get_xml(self, row):
        
        xml_message = '<?xml version="1.0" encoding="UTF-8"?>\n'
        xml_message += '<data_submission schemaName="{}">\n'.format(self.schema)
        xml_message += '\t<dataStructureRows>\n\t\t<dataStructureRow>\n'

        for element in row['dataElements']:
            xml_message += '\t\t\t<dataElement name="{}"><![CDATA[{}]]></dataElement>\n'.format(element['name'],element['value'])
        xml_message += '\t\t</dataStructureRow>\n</dataStructureRows></data_submission>'    
        return xml_message

class dataRow:
    
    def __init__(self, short_name, required = False):
        self.short_name = short_name
        self.required = required
        self.row = dict({'dataElements': [], 'shortName': short_name})
        self.get_dictionary()
        
    def get_dictionary(self):
        dd_api = 'https://ndar.nih.gov/api/datadictionary/v2'
        
        #try:
        r = requests.get(dd_api + 
                              '/datastructure/{}'.format(self.short_name),
                              headers={'Accept':'application/json'})

        ds = json.loads(r.text)
        for element in ds['dataElements']:
            if element['type'] == 'Date':
                element['type'] = 'Date: MM/DD/YYYY 24H:MM:SS'
            if self.required:
                if element['required']=='Required':
                    self.row['dataElements'].append({'name': element['name'], 'value': element['type']})
            else:
                self.row['dataElements'].append({'name': element['name'], 'value': element['type']})
        #except:
        #    print('Error retreiving {} from web service, check the dictionary name and try again.')
         
        

# Code to generate sample XML/JSON messages for the requested data structure

short_name = input('What data structure do you want example messages for (enter short_name)?')
required_only = input('Only include required elements (Yes, No) default is No?')
if required_only == 'Yes':
    required = True
if required_only == 'No':
    required = False
mindar_schema = input('Enter the schema/user name for the miNDAR where you will POST data.')

sample_submission = submission(mindar_schema)
sample_row = dataRow(short_name,required)
sample_submission.message['dataStructureRows'].append(sample_row.row)
print('JSON:\n\n')
print(json.dumps(sample_submission.message, indent=2))
print('\n\nXML:\n\n')
print(sample_submission.get_xml(sample_row.row))

    


What data structure do you want example messages for (enter short_name)?aurora01
Only include required elements (Yes, No) default is No?No
Enter the schema/user name for the miNDAR where you will POST data.obenshaindw_107040
JSON:


{
  "dataStructureRows": [
    {
      "dataElements": [
        {
          "value": "GUID",
          "name": "subjectkey"
        },
        {
          "value": "String",
          "name": "src_subject_id"
        },
        {
          "value": "Date: MM/DD/YYYY 24H:MM:SS",
          "name": "interview_date"
        },
        {
          "value": "Integer",
          "name": "interview_age"
        },
        {
          "value": "String",
          "name": "gender"
        },
        {
          "value": "String",
          "name": "deviceserialnumber"
        },
        {
          "value": "String",
          "name": "startime1"
        },
        {
          "value": "String",
          "name": "stoptime"
        },
        {
          "value": "S

In [11]:
import requests
from getpass import getpass

username = input("What is your NDA username:")
password = getpass("What is your NDA password:")

#POST data in XML format

file = open('files/aurora01.xml', 'r')
xml_data = file.read()

r = requests.post("https://ndar.nih.gov/api/mindar/import", 
                 auth=requests.auth.HTTPBasicAuth(username, password),
                 headers={'content-type':'application/xml'},
                 data = xml_data)
print(r.text)

# POST data in JSON format

file = open('files/aurora01.json', 'r')
json_data = file.read()

r = requests.post("https://ndar.nih.gov/api/mindar/import", 
                 auth=requests.auth.HTTPBasicAuth(username, password),
                 headers={'content-type':'application/json'},
                 data = json_data)
print(r.text)

What is your NDA username:obenshaindw
What is your NDA password:········
{"status":"Successful Submission"}
{"status":"Successful Submission"}
