# Reading JSON Input File & Ingesting JSON File

This notebook will briefly show how to read/load any json file into your python code, and ingest them into Beacon

## Step 1 : Read & parse JSON into code with the method below to verify its content

In [3]:
import json
import os

# Construct the file path
# This should work regardless of operating system
folder_name = "json-file"
file_name = "PAH_metadata.json"
file_path = os.path.join(folder_name, file_name)

def read_json_file(file_path):
    try:
        with open(file_path, 'r') as file:
            data = json.load(file)
        return data
    except FileNotFoundError:
        print(f"Error: File '{file_path}' not found")
        return None
    except json.JSONDecodeError:
        print("Error: Invalid JSON format")
        return None

In [4]:
# Read the JSON file
json_data = read_json_file(file_path)
json_data

{'datasetId': 'UNQ_1',
 'dataset': {'id': 'UNQ_1',
  'createDateTime': '2021-03-21T02:37:00-08:00',
  'dataUseConditions': {'duoDataUse': [{'id': 'DUO:0000042',
     'label': 'general research use',
     'version': '17-07-2016'}]},
  'description': 'Simulation set 1.',
  'externalUrl': 'http://example.org/wiki/Main_Page',
  'info': {},
  'name': 'Dataset with fake data',
  'updateDateTime': '2022-08-05T17:21:00+01:00',
  'version': 'v1.1'},
 'assemblyId': 'GRCH38',
 'individuals': [{'id': 'UNQ_1-1',
   'ethnicity': {'id': 'SNOMED:52075006', 'label': 'Congolese'},
   'geographicOrigin': {'id': 'SNOMED:223688001',
    'label': 'United States of America'},
   'interventionsOrProcedures': [{'procedureCode': {'id': 'NCIT:C79426',
      'label': 'Cancer Diagnostic or Therapeutic Procedure'}},
    {'procedureCode': {'id': 'NCIT:C64264',
      'label': 'Imaging Biomarker Analysis'}}],
   'karyotypicSex': 'XXY',
   'sex': {'id': 'SNOMED:407378000',
    'label': 'Surgically transgendered transse

We have succesfully read and load our JSON file! You can now explore and see if the JSON is correct in its content. If the JSON format is already Beacon compliance, you can directly ingest them with the same method shown on notebooks containing Ingesting to Beacon tutorials (example : ETL-for-beacon-data-ingestion/ETL-ingesting-to-dataportal-beacon.ipynb)

## Step 2 : Re-load the JSON (without parsing) and ingest them

In [12]:
import json
import os
import requests

# Read the JSON file
folder_name = "json-file"
file_name = "PAH_metadata.json"
file_path = os.path.join(folder_name, file_name)

with open(file_path, 'r') as file:
    json_data = json.load(file)
print(json_data)  # If you want to see the content

# Upload the file
presigned_post = {
    "url": "<URL>",
    "fields": {
        "key": "<KEY>",
        "bucket": "<BUCKET>",
        "X-Amz-Algorithm": "AWS4-HMAC-SHA256",
        "X-Amz-Credential": "<CREDENTIAL>",
        "X-Amz-Date": "<DATE>",
        "X-Amz-Security-Token": "<TOKEN>",
        "Policy": "<POLICY>",
        "X-Amz-Signature": "<SIGNATURE>"
    }
}

# Upload to S3
with open(file_path, "rb") as file:
    multipart_form_data = {
        **presigned_post["fields"],
        "file": file,
    }
    response = requests.post(presigned_post["url"], files=multipart_form_data)

if response.status_code == 204:
    print("File uploaded successfully!")
else:
    print("Failed to upload file!")
    print(f"Status Code: {response.status_code}")
    print(f"Response: {response.text}")

'File uploaded successfully'