Skip to content

Publishing Sample Data

Nate Weisz edited this page Apr 5, 2018 · 4 revisions

Overview

This document describes the steps to configure Herd for sample data, the steps to upload sample data, and description of how sample data is accessible by users.

Sample data in Herd is provided by the owner or publisher of data as a file as opposed to having Herd scrape the actual data. This approach was chosen for the following reasons:

  • allows the sample data to be a truly representative sample as opposed to getting the top n rows
  • allows the sample data to be scrubbed or obfuscated in the case it contains any confidential or PII data
  • eliminates the complexity of Herd reading data of various formats from various physical locations with various required permissions

Configuring Herd for sample data

The Herd Administrator must perform the following steps to configure Herd for sample data:

  • Create Bucket and Storage
    • Create an S3 bucket that you want used to store the sample data files
    • Create a Herd Storage (with https://your-herd-hostname/herd-app/docs/rest/index.html#!/Storage/Storage_createStorage) to represent the bucket, being careful to provide:
      • Storage Name must be S3_MANAGED_SAMPLE_DATA
      • Must provide the following attributes (name=value)
        • bucket.name = name of S3 bucket you created for sample data files
        • download.role.arn = fully qualified arn for an IAM role that has permissions to download from the S3 bucket
        • upload.role.arn = fully qualified arn for an IAM role that has permissions to upload to the S3 bucket
        • download.session.duration = number of seconds that temporary download credentials will be valid
        • key.prefix.velocity.template = $namespace/$businessObjectDefinitionName
        • s3.region.name = AWS region such as us-east-1
  • Create SQS queue used for S3 notification
    • Create an SQS queue.
      • Select Permissions Tab of the queue and click 'EditPolicyDocument' then add a policy that allows the Action SQS:SendMessage from the S3 bucket to the queue. See the example under this issue for details.
      • Note - the IAM role that Herd runs as must have permission to select and read messages from this queue
    • Add configuration values in the Herd configuration table with name=value of:
      • sample.data.sqs.queue.name = name of the SQS queue you created
      • sample.data.jms.listener.enabled = true
  • Configure Bucket to send notifications
    • In AWS console, navigate to the properties of the bucket you create for sample data
    • Go to Advanced Setting - Events and create a new notification that sends POST, PUT, Copy, and Complete Multi-part Upload to the SQS queue you created in the previous step

Upload sample data for a Business Object Definition

Now that Herd is configured for sample data, users can upload a sample data file for a Business Object Definition (BDef) using the following steps.

Steps for user to upload sample data file

  1. User should initiate the upload by calling https://your-herd-hostname/herd-app/docs/rest/index.html#!/Upload_and_Download/UploadandDownload_initiateUploadSampleFile and providing the Namespace and BDef Name

    • If User-Namespace authorization is active, only users with WRITE permissions to the provided Namespace can complete this operation
  2. The response will contain temporary credentials and an S3 endpoint, bucket name, and prefix. User should utilize this information to upload the sample data file by interacting directly with S3

    • Ensure the upload is initiated prior to the session expiration time in the response
  3. At this point, the user steps are complete. Herd performs additional steps as listed below.

Steps Herd takes in the background

  1. Herd waits for an S3 notification that a file has been placed in the sample data bucket
  2. Upon receiving a notification, Herd registers that file name with the BDef

This approach is taken for the following reasons:

  • The S3 bucket is not open, it is controlled by temporary credentials
  • End users need only call one API and then place the file in S3 with the provided temporary credentials

View sample data for a Business Object Definition

Users can view and/or download sample data files in two ways:

  1. In Herd-UI
    • Navigate to the Data Entity page for the BDef and click the 'Sample Data' button.
  2. Programatically (these are the steps Herd-UI performs)
    • Call https://your-herd-hostname/herd-app/docs/rest/index.html#!/Upload_and_Download/UploadandDownload_initiateDownloadSingleSampleFile providing information for the desired BDef
    • Use the pre-signed URL to download the file from S3
Clone this wiki locally