# **Information extraction via Amazon Textract**

*- [Adam Muhtar](mailto:adam.b.muhtar@gmail.com)*

---

[Amazon Textract](https://aws.amazon.com/textract/), a machine learning service, automates the extraction of text, handwriting, and data from documents and images. Its new [AnalyzeDocument](https://docs.aws.amazon.com/textract/latest/dg/API_AnalyzeDocument.html) Layout feature enhances document processing by identifying and organizing layout elements like paragraphs, titles, and headers according to human reading patterns. This improvement simplifies handling complex documents, making tasks such as financial reports, medical transcriptions, and contracts more efficient by reducing the need for manual post-processing.

The [Layout](https://docs.aws.amazon.com/textract/latest/dg/layoutresponse.html) feature, introduced in September 2023 alongside an updated [Textractor](https://aws-samples.github.io/amazon-textract-textractor/) toolkit, significantly reduces the complexity of document processing workflows. It enables users to extract and store key elements more easily, improving the speed and accuracy of solutions for handling structured documents. Additionally, studies have shown that this feature enhances AI task accuracy, benefiting large language models in both abstractive and extractive tasks.


## **Table of Contents**

1. [Notebook setup](#section-1)
2. [Load AWS Session and create S3 client](#section-2)
3. [Upload documents to S3 Bucket](#section-3)
4. [Layout extraction using Amazon Textract Textractor toolkit](#section-4)
5. [Unpacking the Textract Document object](#section-5)

---

## 1. Notebook Setup <a id="section-1"></a>

This notebook runs on Python 3.12.6 and will require the following package(s) to be installed:
* `amazon-textract-textractor~=1.8.3`
* `mypy-boto3-s3~=1.35.22`
* `python-dotenv~=1.0.1`

An [Amazon Web Services (AWS)](https://aws.amazon.com) account is required to run this notebook, which involves the following:
* Signing up for an AWS account and creating users with administrative access.
* Setting up the [AWS Command Line Interface (CLI)](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html).
* Granting programmatic access by setting credentials in the AWS credentials profile file on your local system.

In [1]:
# Standard library imports
import json
import logging
import logging.config
import os
from pathlib import Path
import textwrap
from typing import List, Union, Tuple

# Third party imports
from boto3 import Session
from botocore.exceptions import (
    BotoCoreError,
    ClientError,
    NoCredentialsError,
    PartialCredentialsError
)
from dotenv import dotenv_values
from mypy_boto3_s3 import S3Client
from textractor import Textractor
from textractor.data.constants import TextractFeatures
from textractor.data.text_linearization_config import TextLinearizationConfig
from textractor.entities.document import Document
from tqdm import tqdm

In [2]:
# Dictionary-based logging configuration
LOGGING_CONFIG = {
    "version": 1,
    "disable_existing_loggers": False,
    "formatters": {
        "standard": {"format": "%(asctime)s - %(name)s - %(levelname)s - %(message)s"}
    },
    "handlers": {
        "console": {
            "class": "logging.StreamHandler",
            "formatter": "standard",
            "level": "WARNING"
        },
        "file": {
            "class": "logging.FileHandler",
            "filename": "app.log",
            "mode": "a",
            "formatter": "standard",
            "level": "WARNING"
        }
    },
    "loggers": {
        "": {  # root logger
            "handlers": ["console", "file"],
            "level": "WARNING",
            "propagate": True
        },
        "__main__": {  # logger for your main module
            "handlers": ["console", "file"],
            "level": "DEBUG",
            "propagate": False
        }
    }
}

# Load the configuration
logging.config.dictConfig(LOGGING_CONFIG)

# Get loggers
logger = logging.getLogger(__name__)

---

## 2. Load AWS Session and create S3 client <a id="section-2"></a>

We load an AWS session in order to authenticate and establish a secure connection with AWS services, which allows us to send programmatic calls to AWS from the AWS CLI or direct AWS API calls. This authentication is done by supplying your AWS Access and Secret Keys to your CLI credentials configuration, by the following steps:
1. Generate Access and Secret Keys from the [Identity and Access Management (IAM)](https://aws.amazon.com/iam/) page.
2. Set your AWS credentials with the Access and Secret Keys via the following command in your CLI: `aws configure`.
3. Select desired default [region](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html) and output format (e.g. JSON).

The API key should be stored into your respective `.env.secret` file, and should not be shared publicly. The structure of the `.env` file is a series of key-value pairs delimited by an `=` sign, e.g.:
```
AWS_ACCESS_KEY="abcdef12345"
```

Refer to the existing `.env.shared` file in the `.config` folder for reference.

In [3]:
def load_aws_session(
    config_dir: Union[str, Path, os.PathLike]
) -> Session:
    """
    Load AWS session using credentials from environment files and return the S3
    and Textract clients.

    Args:
        config_dir (`str`, `Path`, or `os.PathLike`): The directory where the
            environment files are located. Can be a string, Path object, or
            os.PathLike.

    Returns:
        `Session`: AWS session object.

    Raises:
        `FileNotFoundError`: If the environment files are not found.
        `KeyError`: If the necessary credentials are missing in the environment files.
        `Exception`: If any other error occurs during session creation.
    """
    try:
        # Convert config_dir to a Path object for uniform handling
        config_dir = (
            Path(config_dir) if not isinstance(config_dir, Path) else config_dir
        )

        # Define paths to shared and secret environment files
        shared_env = config_dir / ".env.shared"
        secret_env = config_dir / ".env.secret"

        # Check if environment files exist before loading
        if not shared_env.is_file():
            raise FileNotFoundError(f"{shared_env} not found")
        if not secret_env.is_file():
            raise FileNotFoundError(f"{secret_env} not found")

        # Load environment variables from both files
        config = {
            **dotenv_values(shared_env),  # Load shared config
            **dotenv_values(secret_env)   # Load secret config
        }

        # Ensure necessary AWS keys are present
        aws_access_key = config.get("AWS_ACCESS_KEY")
        aws_secret_key = config.get("AWS_SECRET_KEY")

        if not aws_access_key or not aws_secret_key:
            raise KeyError(
                "AWS Access Key or Secret Key is missing in environment files"
            )

        # Create AWS session
        session = Session(
            aws_access_key_id=aws_access_key,
            aws_secret_access_key=aws_secret_key,
            region_name=config.get("AWS_REGION", "us-east-1")  # Default to "us-east-1" if region is missing
        )

        logging.info("AWS session successfully created")
        return session

    except FileNotFoundError as e:
        logging.error(f"Environment file not found: {e}")
        raise
    except KeyError as e:
        logging.error(f"Missing credentials in environment files: {e}")
        raise
    except (NoCredentialsError, PartialCredentialsError) as e:
        logging.error(f"AWS credentials error: {e}")
        raise
    except Exception as e:
        logging.error(f"An unexpected error occurred while loading AWS session: {e}")
        raise


# Load AWS session and create S3 client
try:
    config_dir = Path.cwd().parent / ".config"
    session = load_aws_session(config_dir)
    s3 = session.client("s3")
    logging.info("S3 client successfully created")
except Exception as e:
    logging.error(f"Failed to initialise AWS session: {e}")

---

## 3. Upload documents to S3 Bucket <a id="section-3"></a>

This section contains several helper functions to assist with S3 operations:

* `list_s3_buckets`: Lists all S3 buckets in the AWS account.
* `list_files_in_s3`: List all files in the specified S3 bucket.
* `file_check_in_s3`: Check if a file already exists in the S3 bucket.
* `generate_paths_and_s3_keys`: Generate the list of PDF files and corresponding S3 keys based on a glob pattern, recursively in all folders within the specified directory.
* `upload_to_s3`: Uploads multiple files to an S3 bucket.

This notebook makes use of several publicly available Pillar 3 reports from major financial institutions, which could be found at the following links as reference:
* [Barclays](https://home.barclays/investor-relations/)
* [Citigroup](https://www.citigroup.com/global/investors)
* [Goldman Sachs](https://www.goldmansachs.com/investor-relations/)
* [JPMorgan Chase & Co](https://www.jpmorganchase.com/ir)
* [Santander](https://www.santander.com/en/shareholders-and-investors)

In [218]:
def list_s3_buckets(s3: S3Client) -> List[str]:
    """
    Lists all S3 buckets in the AWS account.

    Returns:
        `List[str]`: A list of bucket names.
    """
    try:
        # Get the list of all buckets
        response = s3.list_buckets()

        # Extract the bucket names from the response
        buckets = [bucket["Name"] for bucket in response["Buckets"]]

        # Log the successful retrieval of bucket names
        logging.info(f"Retrieved {len(buckets)} buckets: {buckets}")

        return buckets

    except Exception as e:
        # Log the error if there is any issue
        raise logging.error(f"Error listing S3 buckets: {e}")


def list_files_in_s3(bucket: str) -> List[str]:
    """
    List all files in the specified S3 bucket.

    Args:
        * bucket (`str`): The name of the S3 bucket.

    Returns:
        `List[str]`: A list of file keys (paths) in the S3 bucket.
    """
    try:
        # Initialise an empty list to hold file keys
        file_keys = []

        # Use paginator to handle large number of files
        paginator = s3.get_paginator("list_objects_v2")
        
        # Paginate through the results
        for page in paginator.paginate(Bucket=bucket):
            for obj in page.get("Contents", []):
                file_keys.append(obj["Key"])

        logging.info(f"Retrieved {len(file_keys)} files from bucket '{bucket}'")
        return file_keys

    except (BotoCoreError, ClientError) as e:
        logging.error(f"Error listing files in S3 bucket '{bucket}': {e}")
        raise


def file_check_in_s3(bucket: str, s3_key: str) -> bool:
    """
    Check if a file already exists in the S3 bucket.

    Args:
        * bucket (`str`): The name of the S3 bucket.
        * s3_key (`str`): The S3 object key (path in the bucket).

    Returns:
        `bool`: True if the file exists in the bucket, False otherwise.
    """
    try:
        s3.head_object(Bucket=bucket, Key=s3_key)
        return True
    except ClientError as e:
        if e.response["Error"]["Code"] == "404":
            return False
        else:
            logging.error(f"Error checking existence of {s3_key} in S3: {e}")
            raise


def generate_paths_and_s3_keys(
    directory: Union[str, Path], pattern: str = "*.pdf", prefix: str = "documents"
) -> Tuple[List[Path], List[str]]:
    """
    Generate the list of PDF files and corresponding S3 keys based on a glob
    pattern, recursively in all folders within the specified directory.

    Args:
        * directory (`str` or `Path`): The directory to search for files.
        * pattern (`str`): The glob pattern to match files. Default is "*.pdf"
            (all PDF files).
        * prefix (`str`): The prefix to use for S3 keys. Default is "documents".

    Returns:
        * `Tuple[List[Path], List[str]]`: A tuple containing two lists:
            - `paths`: A list of Path objects representing the matched files.
            - `s3_keys`: A list of S3 keys where the folder name is the prefix.
    
    Raises:
        `ValueError`: If the specified directory does not exist
    
    Example:
        ```python
        paths, s3_keys = generate_paths_and_s3_keys("data", pattern="*.pdf", prefix="docs")
        paths
        >>> [Path('data/docs/folder1/file1.pdf'), Path('data/docs/folder2/file2.pdf')]
        s3_keys
        >>> ['docs/folder1/file1.pdf', 'docs/folder2/file2.pdf']
        ```
    """
    # Ensure the directory is a Path object
    directory = Path(directory) if isinstance(directory, str) else directory

    # Check if the directory exists
    if not directory.is_dir():
        raise ValueError(f"'{directory}' does not exist.")

    # Initialise lists for paths and S3 keys
    paths = []
    s3_keys = []

    # Recursively search for PDFs in all subdirectories
    for pdf_path in directory.rglob(pattern):
        if pdf_path.is_file():
            # Add the path to the list
            paths.append(pdf_path)

            # Get the parent folder name as the S3 key prefix
            folder_name = pdf_path.parent.name

            # Generate the S3 key with the folder name as the prefix
            s3_key = f"{prefix}/{folder_name}/{pdf_path.name}"
            s3_keys.append(s3_key)

    return paths, s3_keys


def upload_to_s3(
    files: List[Union[str, Path]], bucket: str, s3_keys: List[str]
) -> List[str]:
    """
    Uploads multiple files to an S3 bucket.

    Args:
        * files (`List[str` or `Path]`): A list of file paths to be uploaded. Each
            element can be a string or Path object.
        * bucket (`str`): The name of the S3 bucket.
        * s3_keys (`List[str]`): A list of S3 object keys corresponding to each file.

    Returns:
        `List[str]`: A list of result messages for each file upload (indicating
            success or failure).

    Raises:
        `FileNotFoundError`: If any of the specified files do not exist.
        `ValueError`: If any input arguments are invalid, such as mismatched
            lengths of `files` and `s3_keys`.
    """
    # Ensure the list of files and s3_keys are the same length
    if len(files) != len(s3_keys):
        raise ValueError("The number of files and S3 keys must be the same.")

    result_messages = []

    # Initialise tqdm progress bar
    with tqdm(total=len(files), desc="Uploading files", unit="file") as progress_bar:
        for file_path, s3_key in zip(files, s3_keys):
            # Convert to Path object if the input is a string
            file_path = Path(file_path) if isinstance(file_path, str) else file_path

            # Ensure the file exists
            if not file_path.is_file():
                raise FileNotFoundError(f"{file_path} does not exist.")
            
            # Check if the file already exists in the S3 bucket
            if file_check_in_s3(bucket, s3_key):
                skip_message = f"File {s3_key} already exists in S3, skipping upload."
                logging.info(skip_message)
                result_messages.append(skip_message)
            else:
                try:
                    # Upload the file to S3
                    s3.upload_file(str(file_path), bucket, s3_key)
                    success_message = f"Upload successful: {s3_key}"
                    result_messages.append(success_message)
                except (BotoCoreError, ClientError) as e:
                    error_message = f"Error uploading to S3 (Boto3 error): {e}"
                    logging.error(error_message)
                    result_messages.append(error_message)
                except Exception as e:
                    error_message = f"Unexpected error during S3 upload: {e}"
                    logging.error(error_message)
                    result_messages.append(error_message)

            # Update the progress bar after each upload
            progress_bar.update(1)

    return result_messages

In [19]:
# Upload files to S3
data_dir = Path.cwd().parent / "data"
paths, s3_keys = generate_paths_and_s3_keys(directory=data_dir)
s3_bucket_name = s3.list_buckets()["Buckets"][0]["Name"]
upload_to_s3(files=paths, bucket=bucket_name, s3_keys=s3_keys)

Uploading files: 100%|██████████| 21/21 [00:03<00:00,  5.50file/s]


['File documents/Citigroup/b3p3d221231.pdf already exists in S3, skipping upload.',
 'File documents/Citigroup/b3p3d211231.pdf already exists in S3, skipping upload.',
 'File documents/Citigroup/b3p3d201231.pdf already exists in S3, skipping upload.',
 'File documents/Citigroup/b3p3dq4231231.pdf already exists in S3, skipping upload.',
 'File documents/Goldman-Sachs/gsguk-q4-2021-pillar-3.pdf already exists in S3, skipping upload.',
 'File documents/Goldman-Sachs/gsguk-q4-2020-pillar-3.pdf already exists in S3, skipping upload.',
 'File documents/Goldman-Sachs/gsguk-q4-2022-pillar-3.pdf already exists in S3, skipping upload.',
 'File documents/Goldman-Sachs/gsguk-q3-2023-pillar-3.pdf already exists in S3, skipping upload.',
 'File documents/Santander/irp-2023-irp-2023-en.pdf already exists in S3, skipping upload.',
 'File documents/Santander/irp-2020-irp-2020-en.pdf already exists in S3, skipping upload.',
 'File documents/Santander/irp-2021-irp-2021-en.pdf already exists in S3, skippi

---

## 4. Layout extraction using [Amazon Textract Textractor](https://aws-samples.github.io/amazon-textract-textractor) toolkit <a id="section-4"></a>

We utilise [Amazon Textract Textractor](https://aws-samples.github.io/amazon-textract-textractor) toolkit's [`start_document_analysis`](https://aws-samples.github.io/amazon-textract-textractor/textractor.html?highlight=start_document_analysis) for asynchronous API calls with `LAYOUT` feature and subsequently exposes the detected layout elements through the page’s `PAGE_LAYOUT` property and its own sub-properties: `TITLES`, `HEADERS`, `FOOTERS`, `TABLES`, `KEY_VALUES`, `PAGE_NUMBERS`, `LISTS`, and `FIGURES`.

In this section, we use JPMorgan Chase's publicly available Pillar 3 report from Q4 2023; specifically, only the section of the report focussing on credit risk.

In [79]:
# Initialise Textractor object; specify S3 bucket and PDF to extract
extractor = Textractor(region_name="us-east-1")
s3_bucket_name = s3.list_buckets()["Buckets"][0]["Name"]
document_name = "documents/JPMorgan/JPMorgan-Pillar-3-Report-Q42023-credit-risk.pdf"

# Define how a document is linearised into a text string
config = TextLinearizationConfig(
    hide_figure_layout=True,
    title_prefix="# ",
    section_header_prefix="## "
)

# Start text extraction process
document = extractor.start_document_analysis(
    file_source=f"s3://{s3_bucket_name}/{document_name}",
    features=[TextractFeatures.LAYOUT, TextractFeatures.TABLES],
    s3_output_path=f"s3://{s3_bucket_name}/textract-output/",
    save_image=False
)

In [41]:
# To save the textract Document object to a JSON file, use the following:
with open("document.json", "w") as f:
    json.dump(document.response, f, indent=4)

In [3]:
# If starting from a pre-extracted textract Document object, use the following:
document = Document().open("document.json")

# Display contents of the Document object
document

This document holds the following data:
Pages - 6
Words - 3140
Lines - 645
Key-values - 0
Checkboxes - 0
Tables - 7
Queries - 0
Signatures - 0
Identity Documents - 0
Expense Documents - 0

## 5. Unpacking the [Textract Document](https://aws-samples.github.io/amazon-textract-textractor/textractor.entities.html#module-textractor.entities.document) object <a id="section-5"></a>

Once the text has been extracted as a [Textract Document](https://aws-samples.github.io/amazon-textract-textractor/textractor.entities.html#module-textractor.entities.document) object, we can unpack this object to obtain the information in reading order and, in the case of tables, formatted into markdown, CSV, and/or pandas DataFrame.

In [9]:
# Helper function to pretty print text with line wrapping
def prettier_print(text: str, width: int = 100) -> None:
    """
    Pretty print the text with line wrapping. Preserve newlines in the text.
    
    Args:
        * text (`str`): The text to be printed.
        * width (`int`): The maximum width of each line. Default is 70
    """
    formatted_text = "\n".join(
        [
            textwrap.fill(line, width) if line.strip() != "" else ""
            for line in text.splitlines()
        ]
    )
    print(formatted_text)

In [14]:
for id, layout in enumerate(document.layouts):
    print(
        "="*80,
        f"Layout Type: {layout.layout_type}",
        f"Page: {layout.page}",
        f"Reading Order: {layout.reading_order}",
        f"ID: {layout.id}",
        f"Doc No.: {id}",
        "-"*80,
        sep="\n"
    )
    prettier_print(layout.text)
    print("="*80, "", sep="\n")

Layout Type: LAYOUT_TITLE
Page: 1
Reading Order: 0
ID: c821335d-7002-4c28-b43a-9723e1c26a31
Doc No.: 0
--------------------------------------------------------------------------------
CREDIT RISK

Layout Type: LAYOUT_TEXT
Page: 1
Reading Order: 1
ID: c716e0c3-fdcc-4365-b8bb-3031cf263fbf
Doc No.: 1
--------------------------------------------------------------------------------
Credit risk is the risk associated with the default or change in credit profile of a client,
counterparty or customer. The Firm provides credit to a variety of customers, ranging from large
corporate and institutional clients to individual consumers and small businesses. The consumer
credit portfolio consists of scored mortgage and home equity loans held in the Consumer & Community
Banking ("CCB") and Asset & Wealth Management ("AWM") business segments; scored mortgage loans held
in the Corporate segment; scored credit card, auto and business banking loans, and overdrafts in
CCB; and the associated lending- relat

In [33]:
prettier_print(document.tables[0].to_markdown(), width=150)

|                                                                      |                                          |
|----------------------------------------------------------------------|------------------------------------------|
| December 31, 2023 (in millions)                                      | Basel III Advanced CECL Transitional RWA |
| Retail exposures                                                     | $ 203,701                                |
| Wholesale exposures                                                  | 502,026                                  |
| Counterparty exposures                                               | 119,310                                  |
| Securitization exposures(a)                                          | 60,476                                   |
| Equity exposures                                                     | 70,073                                   |
| Other exposures                                                      |

In [22]:
for id, layout in enumerate(document.layouts):
    if layout.layout_type == "LAYOUT_TABLE":
        print(
            "="*80,
            f"Layout Type: {layout.layout_type}",
            f"Page: {layout.page}",
            f"Reading Order: {layout.reading_order}",
            f"ID: {layout.id}",
            f"Doc No.: {id}",
            "-"*80,
            layout.to_markdown(),
            "="*80,
            "",
            "",
            sep="\n"
        )

Layout Type: LAYOUT_TABLE
Page: 1
Reading Order: 13
ID: 69c17b8e-6090-4573-95d5-660244321086
Doc No.: 9
--------------------------------------------------------------------------------


|                                                                      |                                          |
|----------------------------------------------------------------------|------------------------------------------|
| December 31, 2023 (in millions)                                      | Basel III Advanced CECL Transitional RWA |
| Retail exposures                                                     | $ 203,701                                |
| Wholesale exposures                                                  | 502,026                                  |
| Counterparty exposures                                               | 119,310                                  |
| Securitization exposures(a)                                          | 60,476                                   |
|

In [193]:
# Example of extracted table formatted as a pandas DataFrame
document.document.tables[0].to_pandas()

Unnamed: 0,0,1
0,"December 31, 2023 (in millions)",Basel III Advanced CECL Transitional RWA
1,Retail exposures,"$ 203,701"
2,Wholesale exposures,502026
3,Counterparty exposures,119310
4,Securitization exposures(a),60476
5,Equity exposures,70073
6,Other exposures,157547
7,CVA,43759
8,Less: Excess eligible credit reserves not incl...,1631
9,Total credit risk RWA,"$ 1,155,261"


In [18]:
# Example of extracted table formatted as list and string
table_text, table_text_as_list = document.tables[0].get_text_and_words()
prettier_print(table_text)

December 31, 2023 (in millions) Basel III Advanced CECL Transitional RWA
Retail exposures        $ 203,701
Wholesale exposures     502,026
Counterparty exposures  119,310
Securitization exposures(a)     60,476
Equity exposures        70,073
Other exposures 157,547
CVA     43,759
Less: Excess eligible credit reserves not included in Tier 2 capital    1,631
Total credit risk RWA   $ 1,155,261


In [37]:
# Example of detailed breakdown of table structure
document.tables[0].children

[<Cell: (1,1), Span: (1, 1), Column Header: False, MergedCell: False>  December 31, 2023 (in millions),
 <Cell: (1,2), Span: (1, 1), Column Header: True, MergedCell: False>  Basel III Advanced CECL Transitional RWA,
 <Cell: (2,1), Span: (1, 1), Column Header: False, MergedCell: False>  Retail exposures,
 <Cell: (2,2), Span: (1, 1), Column Header: False, MergedCell: False>  $ 203,701,
 <Cell: (3,1), Span: (1, 1), Column Header: False, MergedCell: False>  Wholesale exposures,
 <Cell: (3,2), Span: (1, 1), Column Header: False, MergedCell: False>  502,026,
 <Cell: (4,1), Span: (1, 1), Column Header: False, MergedCell: False>  Counterparty exposures,
 <Cell: (4,2), Span: (1, 1), Column Header: False, MergedCell: False>  119,310,
 <Cell: (5,1), Span: (1, 1), Column Header: False, MergedCell: False>  Securitization exposures(a),
 <Cell: (5,2), Span: (1, 1), Column Header: False, MergedCell: False>  60,476,
 <Cell: (6,1), Span: (1, 1), Column Header: False, MergedCell: False>  Equity exposures

In [19]:
# Complete schema of the Textract Document object
document.response

{'DocumentMetadata': {'Pages': 6},
 'JobStatus': 'SUCCEEDED',
 'Blocks': [{'BlockType': 'PAGE',
   'Geometry': {'BoundingBox': {'Width': 1.0,
     'Height': 1.0,
     'Left': 0.0,
     'Top': 0.0},
    'Polygon': [{'X': 0.0, 'Y': 0.0},
     {'X': 1.0, 'Y': 2.5350044552396866e-07},
     {'X': 1.0, 'Y': 1.0},
     {'X': 0.0, 'Y': 1.0}]},
   'Id': '9d6df6a3-708c-47b7-9764-75a8cd1ec910',
   'Relationships': [{'Type': 'CHILD',
     'Ids': ['b74e7905-079e-45cd-b0b1-dd4c752a3c24',
      'd4a630bd-6f8d-4ce6-beff-853fb405f6fa',
      'b688be05-b6bc-44d2-9f71-d7602f130965',
      '9fe58bba-7041-4df2-a022-6c6860ff28f8',
      '42038e15-590a-46a8-ae4f-bffdd392e966',
      '7dd1b004-a88f-4898-832a-ee3616a034f9',
      '530739bb-7bb3-4eaf-b569-7c70965ca242',
      '2be55a43-e9e9-4ca4-8ddb-ef043156664c',
      '1fc19363-5b13-4186-bb3e-4af17cd8de4d',
      '0bb9e8f7-1816-4256-9cea-9e30457224fd',
      '6ff76ed9-cbe7-4029-9648-eaf83041b24c',
      '607168e4-1864-47dc-85bc-11554bb8f23a',
      'bdf4e476