<img src="misc/DHR-Health-Logo.png" width="50%">

### **Business Intelligence**

# py835

The **py835** Python package provides a robust toolset for parsing EDI 835 files using the `pyx12` library. It processes healthcare claim information from EDI 835 files into structured formats like Pandas DataFrames and JSON for seamless data manipulation, reporting, and analysis.

Note that this project is still very much in the early stages. If you require a stable version, please fork this Github repository.



## Features

- **Parse EDI 835 Files:** Load and process `.835` EDI files for healthcare claims and payment information.
- **Extract Data:** Extracts detailed information, including functional groups, transaction sets, claims, services, adjustments, and references.
- **DataFrame Output:** Organizes parsed data into Pandas DataFrames for more convenient analysis.
- **Column Renaming:** Automatically renames columns based on EDI segment codes and descriptions for better readability.
- **Pivot Tables:** Supports pivoting data (e.g., CAS and REF segments) for deeper analysis.
- **JSON Export:** Supports exporting parsed data to JSON format for further use in other systems.

## Installation

To install this package, run the following command:

```bash
pip install git+https://github.com/DHR-Health/py835.git
```

### Dependencies

- `pyx12`: Python library for EDI file parsing.
- `pandas`: Used for organizing parsed data into DataFrames.
- `io`: Standard Python module for handling input/output operations.
- `json`: Used for exporting data to JSON format.

<figcaption>Structure of an 835 file</figcaption>
<img src="misc\835 Structure.png">

# Usage
Parse an 835 file using `py835` to access the data as Pandas dataframes. The parser systematically breaks down the 835 data into hierarchical layers, reflecting the structure of the EDI 835 file:



In [1]:
import py835

# Initialize the parser with the path to your EDI file
parser = py835.Parser(r'misc\example.835')
parser.pandas.STATEMENTS.head(5)

Unnamed: 0,header_id,functional_group_id,statement_id,id,name,value
0,otjWzh-ZjqtV4-3Pztpt-bYJ2ce,dc1CcU-A9Crpm-RCOmi9-5tqs5K,qCz1ZE-NpShQQ-9mhd6n-Ij521x,ST01-835,Transaction Set Identifier Code,835
1,otjWzh-ZjqtV4-3Pztpt-bYJ2ce,dc1CcU-A9Crpm-RCOmi9-5tqs5K,qCz1ZE-NpShQQ-9mhd6n-Ij521x,ST02-835,Transaction Set Control Number,35681
2,otjWzh-ZjqtV4-3Pztpt-bYJ2ce,dc1CcU-A9Crpm-RCOmi9-5tqs5K,qCz1ZE-NpShQQ-9mhd6n-Ij521x,ST03-835,Implementation Convention Reference,
3,otjWzh-ZjqtV4-3Pztpt-bYJ2ce,dc1CcU-A9Crpm-RCOmi9-5tqs5K,qCz1ZE-NpShQQ-9mhd6n-Ij521x,BPR01-I,Transaction Handling Code,I
4,otjWzh-ZjqtV4-3Pztpt-bYJ2ce,dc1CcU-A9Crpm-RCOmi9-5tqs5K,qCz1ZE-NpShQQ-9mhd6n-Ij521x,BPR02-I,Total Actual Provider Payment Amount,810.8


By default, **py835** dataframes are in long-format. The dataframes generated by the package include a custom method for converting these tables to wide format. Use `.flatten` to do so:

In [2]:
parser.pandas.STATEMENTS.head(5).flatten()

Unnamed: 0,header_id,functional_group_id,statement_id,BPR01-I,BPR02-I,ST01-835,ST02-835,ST03-835
0,otjWzh-ZjqtV4-3Pztpt-bYJ2ce,dc1CcU-A9Crpm-RCOmi9-5tqs5K,qCz1ZE-NpShQQ-9mhd6n-Ij521x,I,810.8,835,35681,


You can translate the codes while flattening the data as well, and add a prefix:

In [3]:
parser.pandas.STATEMENTS.head(5).flatten(prefix='STATEMENTS ',translate_columns = True)

Unnamed: 0,header_id,functional_group_id,statement_id,STATEMENTS BPR01-I Transaction Handling Code,STATEMENTS BPR02-I Total Actual Provider Payment Amount,STATEMENTS ST01-835 Transaction Set Identifier Code,STATEMENTS ST02-835 Transaction Set Control Number,STATEMENTS ST03-835 Implementation Convention Reference
0,otjWzh-ZjqtV4-3Pztpt-bYJ2ce,dc1CcU-A9Crpm-RCOmi9-5tqs5K,qCz1ZE-NpShQQ-9mhd6n-Ij521x,I,810.8,835,35681,


You can access individual components of the 835 file using the class tree.

In [4]:
print(f"""
    # First element of the first segment of the first statement in the first functional group of the header
    ID: {parser.HEADER.FUNCTIONAL_GROUPS[0].STATEMENTS[0].segments[0].elements[0].id}
    # Name of the first element of the first segment of the first statement in the first functional group of the header
    Name: {parser.HEADER.FUNCTIONAL_GROUPS[0].STATEMENTS[0].segments[0].elements[0].name}
    # Value of the first element of the first segment of the first statement in the first functional group of the header
    Name: {parser.HEADER.FUNCTIONAL_GROUPS[0].STATEMENTS[0].segments[0].elements[0].value}
""")


    # First element of the first segment of the first statement in the first functional group of the header
    ID: ST01-835
    # Name of the first element of the first segment of the first statement in the first functional group of the header
    Name: Transaction Set Identifier Code
    # Value of the first element of the first segment of the first statement in the first functional group of the header
    Name: 835




# Data Tree 
The hierarchy of the tables generated by **py835** is as follows:

<img src="misc\tree_structure.png">

# Pandas Dataframes
The **py835** parser generates pandas dataframes from your 835 file so that you can quickly import the data into your data warehouse. These are available using `parser.pandas`. The parser generates IDs as it moves through each component of the 835. The `header_id`, `functional_group_id`, `statement_id`, `claim_id`, and `service_id` are all generated by the parser as it moves through the file. You can use these when joining the tables together. 

1. **ISA (Interchange Control Header):**  `parser.pandas.HEADER`
   The top-level layer is the `ISA` segment, which contains metadata about the interchange, such as sender/receiver information, control numbers, and transaction timestamps. This segment serves as a unique identifier for the file. You can retrieve the ISA header as a Pandas DataFrame using `parser.pandas.HEADER`. This allows for easy analysis of interchange metadata, including file-level information.


In [5]:
# Example Header
parser.pandas.HEADER.head(5)

Unnamed: 0,header_id,id,name,value
0,otjWzh-ZjqtV4-3Pztpt-bYJ2ce,ISA01,Authorization Information Qualifier,00
1,otjWzh-ZjqtV4-3Pztpt-bYJ2ce,ISA02,Authorization Information,
2,otjWzh-ZjqtV4-3Pztpt-bYJ2ce,ISA03,Security Information Qualifier,00
3,otjWzh-ZjqtV4-3Pztpt-bYJ2ce,ISA04,Security Information,
4,otjWzh-ZjqtV4-3Pztpt-bYJ2ce,ISA05,Interchange Sender ID Qualifier,ZZ



2. **Functional Groups (GS):**  `parser.pandas.FUNCTIONAL_GROUPS`
   Within each `ISA` segment, there are one or more `GS` (Functional Group Header) segments. Functional groups organize related transaction sets under a specific purpose or business function, such as claims, remittance advice, or payment acknowledgments. You can retrieve information about the functional groups as a Pandas DataFrame using `parser.pandas.FUNCTIONAL_GROUPS`. This table can be joined with the ISA table on the `'header_id'` column for comprehensive data analysis across files.


In [6]:
# Example Functional Group
parser.pandas.FUNCTIONAL_GROUPS.head(5)

Unnamed: 0,header_id,functional_group_id,id,name,value
0,otjWzh-ZjqtV4-3Pztpt-bYJ2ce,dc1CcU-A9Crpm-RCOmi9-5tqs5K,GS01,Functional Identifier Code,HP
1,otjWzh-ZjqtV4-3Pztpt-bYJ2ce,dc1CcU-A9Crpm-RCOmi9-5tqs5K,GS02,Application Sender's Code,ABCD
2,otjWzh-ZjqtV4-3Pztpt-bYJ2ce,dc1CcU-A9Crpm-RCOmi9-5tqs5K,GS03,Application Receiver's Code,ABCD
3,otjWzh-ZjqtV4-3Pztpt-bYJ2ce,dc1CcU-A9Crpm-RCOmi9-5tqs5K,GS04,Date,20190827
4,otjWzh-ZjqtV4-3Pztpt-bYJ2ce,dc1CcU-A9Crpm-RCOmi9-5tqs5K,GS05,Time,12345678



3. **Statements (ST):**  `parser.pandas.STATEMENTS`
   Inside each functional group, `ST` segments define statements, also known as transaction sets. Each transaction set corresponds to a statement, bundling related claims, payments, or service details. One 835 file can have multiple transaction sets, which serve as logical groups for payment and claim details. You can extract statement data as a Pandas DataFrame using `parser.pandas.STATEMENTS`. This table can be joined with the functional group data using the composite key `['header_id', 'functional_group_id']`.



4. **Claims (CLP):**  `parser.pandas.CLAIMS`
   Each transaction set breaks down further into individual claims (`CLP` segments). Claims represent billing information for healthcare services rendered, including important details such as claim IDs, patient identifiers, the total amount billed, adjustments, payments made, and any denials or rejections. You can retrieve claim information as a Pandas DataFrame using `parser.pandas.CLAIMS`. Claims can be joined to statement data using the composite key `['header_id', 'functional_group_id', 'statement_id']`.

    4a. **Claim Adjustments (CAS):**  `parser.pandas.CLAIMS_CAS`
       Claims often have adjustments (`CAS` segments), which represent reductions or additions to the claim amount based on specific reasons like contractual obligations, patient responsibility, or denials. The parser extracts all adjustments, grouping them by claim, and allows you to retrieve this data in a Pandas DataFrame via `parser.pandas.CLAIMS_CAS`.

    4b. **Claim References (REF):**  `parser.pandas.CLAIMS_REF`
       The parser captures `REF` (Reference Identification) segments, which contain additional reference information related to claims. These may include provider identification numbers, patient account numbers, or other important reference codes. You can access reference data as a Pandas DataFrame via `parser.pandas.CLAIMS_REF`.

    4c. **Claim Service Identification (LQ):**  `parser.pandas.CLAIMS_LQ`
       The `LQ` segments provide additional information related to the services or claims, such as service qualifiers and codes. These segments are extracted into a Pandas DataFrame for claims via `parser.pandas.CLAIMS_LQ`, which can be joined to the claims table.

    4d. **Claim Date/Time (DTM):**  `parser.pandas.CLAIMS_DTM`
       The `DTM` segments represent various date and time-related information for claims (e.g., service dates, adjudication dates). You can retrieve claim-related date/time information as a Pandas DataFrame via `parser.pandas.CLAIMS_DTM`.

In [7]:
# Example Claim Data
parser.pandas.CLAIMS.head(5)

Unnamed: 0,header_id,functional_group_id,statement_id,claim_id,id,name,value
0,otjWzh-ZjqtV4-3Pztpt-bYJ2ce,dc1CcU-A9Crpm-RCOmi9-5tqs5K,qCz1ZE-NpShQQ-9mhd6n-Ij521x,JAGubJ-pegsvc-ac3d2U-JVm6D7,CLP01,Patient Control Number,7722337.0
1,otjWzh-ZjqtV4-3Pztpt-bYJ2ce,dc1CcU-A9Crpm-RCOmi9-5tqs5K,qCz1ZE-NpShQQ-9mhd6n-Ij521x,JAGubJ-pegsvc-ac3d2U-JVm6D7,CLP02,Claim Status Code,1.0
2,otjWzh-ZjqtV4-3Pztpt-bYJ2ce,dc1CcU-A9Crpm-RCOmi9-5tqs5K,qCz1ZE-NpShQQ-9mhd6n-Ij521x,JAGubJ-pegsvc-ac3d2U-JVm6D7,CLP03,Total Claim Charge Amount,226.0
3,otjWzh-ZjqtV4-3Pztpt-bYJ2ce,dc1CcU-A9Crpm-RCOmi9-5tqs5K,qCz1ZE-NpShQQ-9mhd6n-Ij521x,JAGubJ-pegsvc-ac3d2U-JVm6D7,CLP04,Claim Payment Amount,132.0
4,otjWzh-ZjqtV4-3Pztpt-bYJ2ce,dc1CcU-A9Crpm-RCOmi9-5tqs5K,qCz1ZE-NpShQQ-9mhd6n-Ij521x,JAGubJ-pegsvc-ac3d2U-JVm6D7,CLP05,Patient Responsibility Amount,



5. **Service Line Items (SVC):**  `parser.pandas.SERVICES`
   Within each claim, service line items (`SVC` segments) detail individual healthcare services or procedures performed during the treatment. The line item data includes service codes, charges, allowed amounts, and any related adjustments. You can extract service line data as a Pandas DataFrame using `parser.pandas.SERVICES`. These can be linked to the claims table using the composite key `['header_id', 'functional_group_id', 'statement_id', 'claim_id']`.

    5a. **Service Adjustments (CAS):**  `parser.pandas.SERVICES_CAS`
       Services often have adjustments (`CAS` segments), which represent reductions or additions to the service amount based on specific reasons like contractual obligations, patient responsibility, or denials. The parser extracts all adjustments, grouping them by service level, and allows you to retrieve this data as a Pandas DataFrame via `parser.pandas.SERVICES_CAS`.

    5b. **Service References (REF):**  `parser.pandas.SERVICES_REF`
       The parser captures `REF` (Reference Identification) segments for service-level items, which may contain reference information like procedure codes or authorization numbers. These references are extracted into a Pandas DataFrame using `parser.pandas.SERVICES_REF`.

    5c. **Service Identification (LQ):**  `parser.pandas.SERVICES_LQ`
       Service-level `LQ` segments contain service-specific qualifiers and codes. You can retrieve this information as a Pandas DataFrame via `parser.pandas.SERVICES_LQ`.

    5d. **Service Date/Time (DTM):**  `parser.pandas.SERVICES_DTM`
       The `DTM` segments for services capture date and time-related information (e.g., service dates, procedure dates). This data is available as a Pandas DataFrame using `parser.pandas.SERVICES_DTM`.


In [8]:
# Example Service CAS Data
parser.pandas.SERVICES_CAS

Unnamed: 0,header_id,functional_group_id,statement_id,claim_id,service_id,id,name,value
0,otjWzh-ZjqtV4-3Pztpt-bYJ2ce,dc1CcU-A9Crpm-RCOmi9-5tqs5K,qCz1ZE-NpShQQ-9mhd6n-Ij521x,JAGubJ-pegsvc-ac3d2U-JVm6D7,nl7fRY-7FVkUV-2mJawy-t2i3gI,CAS01-CO,Claim Adjustment Group Code,CO
1,otjWzh-ZjqtV4-3Pztpt-bYJ2ce,dc1CcU-A9Crpm-RCOmi9-5tqs5K,qCz1ZE-NpShQQ-9mhd6n-Ij521x,JAGubJ-pegsvc-ac3d2U-JVm6D7,nl7fRY-7FVkUV-2mJawy-t2i3gI,CAS02-CO,Adjustment Reason Code,45
2,otjWzh-ZjqtV4-3Pztpt-bYJ2ce,dc1CcU-A9Crpm-RCOmi9-5tqs5K,qCz1ZE-NpShQQ-9mhd6n-Ij521x,JAGubJ-pegsvc-ac3d2U-JVm6D7,nl7fRY-7FVkUV-2mJawy-t2i3gI,CAS03-CO,Adjustment Amount,21
3,otjWzh-ZjqtV4-3Pztpt-bYJ2ce,dc1CcU-A9Crpm-RCOmi9-5tqs5K,qCz1ZE-NpShQQ-9mhd6n-Ij521x,JAGubJ-pegsvc-ac3d2U-JVm6D7,nl7fRY-7FVkUV-2mJawy-t2i3gI,CAS04-CO,Adjustment Quantity,
4,otjWzh-ZjqtV4-3Pztpt-bYJ2ce,dc1CcU-A9Crpm-RCOmi9-5tqs5K,qCz1ZE-NpShQQ-9mhd6n-Ij521x,JAGubJ-pegsvc-ac3d2U-JVm6D7,nl7fRY-7FVkUV-2mJawy-t2i3gI,CAS05-CO,Adjustment Reason Code,
...,...,...,...,...,...,...,...,...
470,otjWzh-ZjqtV4-3Pztpt-bYJ2ce,dc1CcU-A9Crpm-RCOmi9-5tqs5K,qCz1ZE-NpShQQ-9mhd6n-Ij521x,fPfsSV-rsxaRs-AsCg4p-gLJRDn,BKKvyX-vdIxUP-icHxRg-O9y92d,CAS15-CO,Adjustment Amount,
471,otjWzh-ZjqtV4-3Pztpt-bYJ2ce,dc1CcU-A9Crpm-RCOmi9-5tqs5K,qCz1ZE-NpShQQ-9mhd6n-Ij521x,fPfsSV-rsxaRs-AsCg4p-gLJRDn,BKKvyX-vdIxUP-icHxRg-O9y92d,CAS16-CO,Adjustment Quantity,
472,otjWzh-ZjqtV4-3Pztpt-bYJ2ce,dc1CcU-A9Crpm-RCOmi9-5tqs5K,qCz1ZE-NpShQQ-9mhd6n-Ij521x,fPfsSV-rsxaRs-AsCg4p-gLJRDn,BKKvyX-vdIxUP-icHxRg-O9y92d,CAS17-CO,Adjustment Reason Code,
473,otjWzh-ZjqtV4-3Pztpt-bYJ2ce,dc1CcU-A9Crpm-RCOmi9-5tqs5K,qCz1ZE-NpShQQ-9mhd6n-Ij521x,fPfsSV-rsxaRs-AsCg4p-gLJRDn,BKKvyX-vdIxUP-icHxRg-O9y92d,CAS18-CO,Adjustment Amount,



The parser ensures that all segments (ISA, GS, ST, CLP, SVC) are organized in a structured, hierarchical format for easy access and analysis. It also captures important references and adjustments at various levels using `REF` and `CAS` segments, further enhancing the breakdown of claims and services.

# Quick Export
You can use the tables in `parser.pandas` to import the data into your data warehouse for long-term storage, though their long-format structure isn't well-suited for analytics. We've included a way of quickly exporting your data. 


In [9]:
from py835 import Parser

# Initialize the parser with the path to your EDI 835 file
parser = Parser(r'misc\example.835')

# Generate a financial report as a Pandas DataFrame
financial_report_df = parser.flatten()

# Display the first few rows of the DataFrame
financial_report_df.head(3)

Unnamed: 0,index,header_id,HEADER IEA01 Number of Included Functional Groups,HEADER IEA02 Interchange Control Number,HEADER ISA01 Authorization Information Qualifier,HEADER ISA02 Authorization Information,HEADER ISA03 Security Information Qualifier,HEADER ISA04 Security Information,HEADER ISA05 Interchange Sender ID Qualifier,HEADER ISA06 Interchange Sender ID,...,SERVICE CAS18-CO Adjustment Amount,SERVICE CAS18-PR Adjustment Amount,SERVICE CAS19-CO Adjustment Quantity,SERVICE CAS19-PR Adjustment Quantity,SERVICE DTM01-472 Date Time Qualifier,SERVICE DTM02-472 Service Date,SERVICE DTM03-472 Time,SERVICE DTM04-472 Time Code,SERVICE DTM05-472 Date Time Period Format Qualifier,SERVICE DTM06-472 Date Time Period
0,0,yOgTWr-noWjkj-jgqvx1-zoG12Z,1,191511902,0,,0,,ZZ,ABCPAYER,...,,,,,472,20190324,,,,
1,1,yOgTWr-noWjkj-jgqvx1-zoG12Z,1,191511902,0,,0,,ZZ,ABCPAYER,...,,,,,472,20190324,,,,
2,2,yOgTWr-noWjkj-jgqvx1-zoG12Z,1,191511902,0,,0,,ZZ,ABCPAYER,...,,,,,472,20190324,,,,



Note that the resulting dataframe can have over 200 columns, depending on the amount of data in your 835 file.

This method converts each table to long-format and left joins the data, starting from `pandas.parser.HEADER` to `pandas.parser.SERVICES_DTM`. Note that the relationship from the CAS tables to Claims and Services is many-to-one, so we group the CAS for each claim or service together in this export.

## Contributing

Contributions are welcome! Feel free to submit pull requests or open issues.

1. Fork the repo.
2. Create your feature branch (`git checkout -b feature/my-feature`).
3. Commit your changes (`git commit -am 'Add some feature'`).
4. Push to the branch (`git push origin feature/my-feature`).
5. Open a pull request.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
