# Querying FHIR Data With PartiQL Redshift

This notebook demonstrates how to use PartiQL in Redshift to analyze FHIR data stored on S3.  We will also demonstrate the use of the open source schema induction tool (https://github.com/awslabs/amazon-redshif-json-schema-induction) to generate Create Table DDL for Redshift over the JSON data.

## Step 1:  Download the FHIR data and store it in S3

In [None]:
import random
import string
import json
def randomString(stringLength=10):
    """Generate a random string of fixed length """
    letters = string.ascii_lowercase
    return ''.join(random.choice(letters) for i in range(stringLength))

bucket_name = "demo-partiql-" + randomString()
bucket_name

In [None]:
!aws s3 mb s3://$bucket_name

In [None]:
!ls -alh claims.json

In [None]:
# The claims json has no line breaks. We will show 1000 characters to not overwhelm the browser
!head -c 1000 claims.json

In [None]:
!aws s3 cp claims.json s3://$bucket_name/fhir/claims/claims.json

## Step 2:  Download the Schema Induction Tool and run it for the data above

In [None]:
!ls -alh *.jar

In [None]:
!java -jar schema-induction-1.0.0.jar -h

### Now lets run the tool 

In [None]:
%%bash -s $bucket_name --out output --err error

java -jar schema-induction-1.0.0.jar \
-i s3://$1/fhir/claims/claims.json \
-d claims.ddl \
-t fhir.Claims \
-l s3://$1/fhir/claims \
-r "us-east-2"  \
-a \
-s claims.schema.json \
-root Claim

In [None]:
print("output:",output,"error:", error)

In [None]:
# Lets review the generated DDL
!cat claims.ddl

In [None]:
# Lets display the induced schema for Claims
from IPython.display import JSON
schema = json.load(open('claims.schema.json'))
JSON(schema)