# Loading data from the BCDA API

This notebook demonstrates how to load data from the Blue Button 2.0 Beneficiary Claims Data API (BCDA) and uploading the JSON data to an AWS storage bucket. The data loaded and transformed into patient-level longitudinal data for analysis (see *notebookX*).

*The notebook will largely follow instructions available in this link: https://bcda.cms.gov/guide.html#try-the-api*

## Step 1: Obtain an access token
First we need to decide what size and type of synthetic dataset we want to download. We can choose from the following main type options (as outlined on the BCDA website):

**Simple Datasets - Five Sizes**
Five of these datasets are simple approximations of BCDA data designed for model entities to test the stress of retrieving and downloading large data files into their internal ingestion processes. These datasets are offered in various sizes so that organizations can test files with a number of beneficiaries that best matches the needs of their organization (the datasets range from 50 to 30,000 synthetic beneficiaries). However, the data in these API payloads will not reflect the distribution of disease and demographic information you might expect from production data.
- Sizes: 50, 500, 5,000, 15,000, 30,000 beneficiaries

**Advanced Datasets - Two Sizes**
These two datasets are designed to offer data that is a more accurate representation of BCDA production data. They follow BCDA’s Bulk FHIR format and should contain a more realistic distribution of disease and demographic information. The advanced datasets are offered in two sizes: Extra Small (100 beneficiaries) and Large (10,000 beneficiaries). We suggest using the Extra-Small Model Entity file to simply view and begin to understand the format of BCDA data. You may want to use the Large dataset for more in-depth exploration of the data or early load testing of your systems.
- Sizes: 100, 10,000 beneficiaries

*There are also partially adjudicated datasets -- but we won't be using these for now.*

As an initial step, we will download the Extra Small Advanced Dataset (100 beneficiaries) to understand the data format and structure. We will then proceed to download the Large Advanced Dataset (10,000 beneficiaries) for further analysis and practice.

## Step 1: Obtain an access token
To get an access token, we need to submit credentials via a cURL command to get an access token. The credentials are provided in the form of a client ID and client secret. The client ID and client secret are provided here: https://bcda.cms.gov/guide.html#try-the-api, where each of the different data types and sizes have a unique client ID and client secret. 

Here are the credentials for the Extra Small Advanced Dataset (100 beneficiaries):
- Client ID: `e75679c2-1b58-4cf5-8664-d3706de8caf5`
- Client Secret: `50eeab7d37a8bf17c8dad970116508f9656a1b0954fe9a467e4658643a4a877945a5096707da9e91`


### Sample cURL command to Submit Credentials for an Access Token
> curl -d "" -X POST "https://sandbox.bcda.cms.gov/auth/token" \
	--user 2462c96b-6427-4efb-aed7-118e20c2e997:825598c105bd1fe021c9eb9d41b30e82beb7a505a1184282e69891f76aa0a396dc9d20f35c9df4a5 \
	-H "accept: application/json"

Before we can run this, we need to import **requests** and **json** libraries.

In [6]:
# Import the necessary libraries
import requests
import json
import base64

In [7]:
# We'll use the requests library to make a POST request to the API
# It'll be structure a bit differently to the sample cURL command above

# Define the URL
bcda_url = "https://sandbox.bcda.cms.gov/auth/token"

# Define the client ID and secret
client_id = "e75679c2-1b58-4cf5-8664-d3706de8caf5"
client_secret = "50eeab7d37a8bf17c8dad970116508f9656a1b0954fe9a467e4658643a4a877945a5096707da9e91"

# Encode Client ID and Secret
credentials = f"{client_id}:{client_secret}"
encoded_credentials = base64.b64encode(credentials.encode('utf-8')).decode('utf-8')

headers = {
    "accept": "application/json",
    "Authorization": f"Basic {encoded_credentials}"
}

response = requests.post(bcda_url, headers=headers)

print(response.text)



{"access_token": "eyJhbGciOiJSUzUxMiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE3MjE2ODI0MTYsImp0aSI6IjE4YjI3YTk2LWNjMzgtNGFiMy1hZDQ0LWRmNWQ0ZDFmMjZmNCIsImlhdCI6MTcyMTY4MTIxNiwiaXNzIjoic3NhcyIsInVzZSI6IkFjY2Vzc1Rva2VuIiwiY2lkIjoiZTc1Njc5YzItMWI1OC00Y2Y1LTg2NjQtZDM3MDZkZThjYWY1Iiwic3lzIjoiMzIiLCJkYXQiOiJ7XCJjbXNfaWRzXCI6IFtcIkE5OTk4XCJdfSJ9.VkDMlN2EcFr2V2AQmODGelb3SRoDwTLnGYKsMYpmXtjXcxBXCSV0d4NjP93N5QGF69q65idxW0LYGTueuc_EQwi9sdpbZE9S3u4iN4yzFVHGm--5AAtimw5t6rys4cGPabBH9cDBDX6gR1Csuan9f0eT-kSNQc3Ep72OeQKoAWH5XZheSD3Fw5rpxdJKTaFHj2G-BzwjZ8ruQ4iIMCQ5ajuMbYvQrTXVPabDggvACRAtGr7DZnT-qTyblTrOd1BNhQOmfLodJeyY54DBkeuGoifyhqe0_1YQiBsfaXnf0LeOCA_bi6TIbibAfGq-wHckSICrJ48yOIhHBsiD2yFB5Q", "expires_in": "1200", "token_type":"bearer"}
