<img src="misc/DHR-Health-Logo.png" width="50%">

### **Business Intelligence**

# py835

The **py835** Python package provides a robust toolset for parsing EDI 835 files using the `pyx12` library. It processes healthcare claim information from EDI 835 files into structured formats like Pandas DataFrames and JSON for seamless data manipulation, reporting, and analysis. This allows organizations to quickly extract their data from these files for long-term storage and analysis.

Note that this project is still very much in the early stages. If you require a stable version, please fork this Github repository. We are working on providing a way of translating fields, such as error and reason codes, though that functionality is in the works. 

#### **Table of Contents**
- [Features](#features)
- [Installation](#installation)
- [Structure of an 835 file](#structure-of-an-835-file)
- [Usage](#usage)
  - [Flattening DataFrames (long-to-wide)](#flattening-dataframes)
  - [Accessing 835 Components](#accessing-835-components)
- [Data Tree](#data-tree)
- [Pandas DataFrames](#pandas-dataframes)
- [Quick Export](#quick-export)
- [Contributing](#contributing)
- [License](#license)

## Features

- **Parse EDI 835 Files:** Load and process `.835` EDI files for healthcare claims and payment information.
- **Extract Data:** Extracts detailed information, including functional groups, transaction sets, claims, services, adjustments, and references.
- **DataFrame Output:** Organizes parsed data into Pandas DataFrames for more convenient analysis.
- **Column Renaming:** Automatically renames columns based on EDI segment codes and descriptions for better readability.
- **Pivot Tables:** Supports pivoting data (e.g., CAS and REF segments) for deeper analysis.
- **JSON Export:** Supports exporting parsed data to JSON format (via pandas) for further use in other systems.


## Installation

To install this package, run the following command:

```bash
pip install git+https://github.com/DHR-Health/py835.git
```

### Dependencies

- `pyx12`: Python library for EDI file parsing.
- `pandas`: Used for organizing parsed data into DataFrames.
- `io`: Standard Python module for handling input/output operations.
- `json`: Used for exporting data to JSON format.

<figcaption><h3>Structure of an 835 file</h3></figcaption>
<img src="misc\835 Structure.png">

# Usage
Parse an 835 file using `py835` to access the data as Pandas dataframes. The parser systematically breaks down the 835 data into hierarchical layers, reflecting the structure of the EDI 835 file. For example, to access statements within the file:


In [1]:
import py835

# Initialize the parser with the path to your EDI file
parser = py835.Parser(r'misc\example.835')
parser.TABLES['ST']

Unnamed: 0,header_id,functional_group_id,statement_id,segment,field,name,value
0,lpgxdL-dVX6Ky-gRRL2K-RN2KIK,4VECQI-4wkvQE-hl9HaM-cIOfEq,THRyGH-OMHBlj-AcWmBT-X3eblV,ST,ST01,Transaction Set Identifier Code,835.0
1,lpgxdL-dVX6Ky-gRRL2K-RN2KIK,4VECQI-4wkvQE-hl9HaM-cIOfEq,THRyGH-OMHBlj-AcWmBT-X3eblV,ST,ST02,Transaction Set Control Number,35681.0
2,lpgxdL-dVX6Ky-gRRL2K-RN2KIK,4VECQI-4wkvQE-hl9HaM-cIOfEq,THRyGH-OMHBlj-AcWmBT-X3eblV,ST,ST03,Implementation Convention Reference,


You can find out which tables are available using the dictionary keys:

In [2]:
[key for key in parser.TABLES]

['HEADER',
 'FUNCTIONAL_GROUPS',
 'STATEMENTS',
 'STATEMENTS_REF',
 'CLAIMS',
 'SERVICES',
 'SERVICES_CAS',
 'SERVICES_DTM',
 'FOOTER']

By default, **py835** dataframes are in long-format. The dataframes generated by the package include a custom method for converting these tables to wide format. Use `.flatten` to do so:

In [3]:
# Original Table structure
parser.TABLES['ST']

Unnamed: 0,header_id,functional_group_id,statement_id,segment,field,name,value
0,lpgxdL-dVX6Ky-gRRL2K-RN2KIK,4VECQI-4wkvQE-hl9HaM-cIOfEq,THRyGH-OMHBlj-AcWmBT-X3eblV,ST,ST01,Transaction Set Identifier Code,835.0
1,lpgxdL-dVX6Ky-gRRL2K-RN2KIK,4VECQI-4wkvQE-hl9HaM-cIOfEq,THRyGH-OMHBlj-AcWmBT-X3eblV,ST,ST02,Transaction Set Control Number,35681.0
2,lpgxdL-dVX6Ky-gRRL2K-RN2KIK,4VECQI-4wkvQE-hl9HaM-cIOfEq,THRyGH-OMHBlj-AcWmBT-X3eblV,ST,ST03,Implementation Convention Reference,


In [4]:
parser.TABLES['ST'].flatten()

Unnamed: 0,header_id,functional_group_id,statement_id,ST ST01,ST ST02,ST ST03
0,lpgxdL-dVX6Ky-gRRL2K-RN2KIK,4VECQI-4wkvQE-hl9HaM-cIOfEq,THRyGH-OMHBlj-AcWmBT-X3eblV,835,35681,


You can add descriptions for the fields while flattening the data as well (`descriptions = True`) and add a prefix to the flattened column names (`prefix = <PREFIX>`)

In [5]:
parser.TABLES['ST'].flatten(prefix='My Prefix ',descriptions = True)

Unnamed: 0,header_id,functional_group_id,statement_id,My Prefix ST ST01 Transaction Set Identifier Code,My Prefix ST ST02 Transaction Set Control Number,My Prefix ST ST03 Implementation Convention Reference
0,lpgxdL-dVX6Ky-gRRL2K-RN2KIK,4VECQI-4wkvQE-hl9HaM-cIOfEq,THRyGH-OMHBlj-AcWmBT-X3eblV,835,35681,


You can access individual components of the 835 file using the dictionary at `parser.dict`, if you so desire. The `parser.dict` dictionary contains your 835 data as a JSON:

In [6]:
# Get the REF table as a JSON
parser.dict['REF']

[{'header_id': 'lpgxdL-dVX6Ky-gRRL2K-RN2KIK',
  'functional_group_id': '4VECQI-4wkvQE-hl9HaM-cIOfEq',
  'statement_id': 'THRyGH-OMHBlj-AcWmBT-X3eblV',
  'ref_id': 'QAWOBu-0cGPAC-hu4Q3v-F2OaGO',
  'segments': [{'segment': 'REFEV',
    'REF01': {'name': 'Reference Identification Qualifier', 'value': 'EV'},
    'REF02': {'name': 'Receiver Identifier', 'value': 'XYZ CLEARINGHOUSE'},
    'REF03': {'name': 'Description', 'value': None},
    'REF04': {'name': 'Reference Identifier', 'value': None}}]},
 {'header_id': 'lpgxdL-dVX6Ky-gRRL2K-RN2KIK',
  'functional_group_id': '4VECQI-4wkvQE-hl9HaM-cIOfEq',
  'statement_id': 'THRyGH-OMHBlj-AcWmBT-X3eblV',
  'ref_id': 'Z3Y4eR-OarHxq-ZMg3fR-DfCwCL',
  'segments': [{'segment': 'REFTJ',
    'REF01': {'name': 'Reference Identification Qualifier', 'value': 'TJ'},
    'REF02': {'name': 'Additional Payee Identifier', 'value': '212121212'},
    'REF03': {'name': 'Description', 'value': None},
    'REF04': {'name': 'Reference Identifier', 'value': None}}]}]

# Pandas Dataframes
The **py835** parser generates pandas dataframes from your 835 file so that you can quickly import the data into your data warehouse. These are available using the `parser.TABLES` dictionary. The parser generates ids (`ISA_ID`, `GS_ID`, `ST_ID`, `CLP_ID`, and `SVC_ID`) as it passes through each component. You can use these IDs when joining the various tables together.

1. **ISA (Interchange Control Header):**  `parser.TABLES['ISA']`
   
   The top-level layer is the ISA segment, also called The Header, which contains metadata about the interchange, such as sender/receiver information, control numbers, and transaction timestamps. This segment serves as a unique identifier for the file. You can retrieve the ISA header as a Pandas DataFrame using `parser.pandas['ISA']`. This allows for easy analysis of interchange metadata, including file-level information.


In [7]:
# Example Header
parser.TABLES['ISA'].head(5)

Unnamed: 0,header_id,segment,field,name,value
0,lpgxdL-dVX6Ky-gRRL2K-RN2KIK,ISA,ISA01,Authorization Information Qualifier,00
1,lpgxdL-dVX6Ky-gRRL2K-RN2KIK,ISA,ISA02,Authorization Information,
2,lpgxdL-dVX6Ky-gRRL2K-RN2KIK,ISA,ISA03,Security Information Qualifier,00
3,lpgxdL-dVX6Ky-gRRL2K-RN2KIK,ISA,ISA04,Security Information,
4,lpgxdL-dVX6Ky-gRRL2K-RN2KIK,ISA,ISA05,Interchange Sender ID Qualifier,ZZ



2. **Functional Groups (GS):**  `parser.TABLES['GS']`
   
   Within each `ISA` segment, there are one or more `GS` (Functional Group Header) segments, also refered to as *Transaction Sets*. Functional groups organize related transaction sets under a specific purpose or business function, such as claims, remittance advice, or payment acknowledgments. You can retrieve information about the functional groups as a Pandas DataFrame using `parser.TABLES['GS']`. This table can be joined with the ISA table on the `'ISA_ID'` column for comprehensive data analysis across files.


In [8]:
# Example Functional Group
parser.TABLES['GS'].head(5)

Unnamed: 0,header_id,functional_group_id,segment,field,name,value
0,lpgxdL-dVX6Ky-gRRL2K-RN2KIK,4VECQI-4wkvQE-hl9HaM-cIOfEq,GS,GS01,Functional Identifier Code,HP
1,lpgxdL-dVX6Ky-gRRL2K-RN2KIK,4VECQI-4wkvQE-hl9HaM-cIOfEq,GS,GS02,Application Sender's Code,ABCD
2,lpgxdL-dVX6Ky-gRRL2K-RN2KIK,4VECQI-4wkvQE-hl9HaM-cIOfEq,GS,GS03,Application Receiver's Code,ABCD
3,lpgxdL-dVX6Ky-gRRL2K-RN2KIK,4VECQI-4wkvQE-hl9HaM-cIOfEq,GS,GS04,Date,20190827
4,lpgxdL-dVX6Ky-gRRL2K-RN2KIK,4VECQI-4wkvQE-hl9HaM-cIOfEq,GS,GS05,Time,12345678



1. **Statements (ST):  parser.TABLES['ST']**
   
   Inside each functional group, `ST` segments define statements, also known as transaction sets. Each transaction set corresponds to a statement, bundling related claims, payments, or service details. One 835 file can have multiple transaction sets, which serve as logical groups for payment and claim details. You can extract statement data as a Pandas DataFrame using `parser.TABLES['ST']`. This table can be joined with the functional group data using the composite key `['ISA_ID', 'GS_ID']`.



4. **Claims (CLP):**  `parser.TABLES['CLP']`

   Each transaction set breaks down further into individual claims (`CLP` segments). Claims represent billing information for healthcare services rendered, including important details such as claim IDs, patient identifiers, the total amount billed, adjustments, payments made, and any denials or rejections. You can retrieve claim information as a Pandas DataFrame using `parser.TABLES['CLP']`. Claims can be joined to statement data using the composite key `['ISA_ID', 'GS_ID', 'ST_ID']`.

In [9]:
# Example Claim Data
parser.TABLES['CLP'].head(5)

Unnamed: 0,header_id,functional_group_id,statement_id,claim_id,segment,field,name,value
0,lpgxdL-dVX6Ky-gRRL2K-RN2KIK,4VECQI-4wkvQE-hl9HaM-cIOfEq,THRyGH-OMHBlj-AcWmBT-X3eblV,xlMbJ4-uD6zrg-dRdx5Y-mrcgUE,CLP,CLP01,Patient Control Number,7722337.0
1,lpgxdL-dVX6Ky-gRRL2K-RN2KIK,4VECQI-4wkvQE-hl9HaM-cIOfEq,THRyGH-OMHBlj-AcWmBT-X3eblV,xlMbJ4-uD6zrg-dRdx5Y-mrcgUE,CLP,CLP02,Claim Status Code,1.0
2,lpgxdL-dVX6Ky-gRRL2K-RN2KIK,4VECQI-4wkvQE-hl9HaM-cIOfEq,THRyGH-OMHBlj-AcWmBT-X3eblV,xlMbJ4-uD6zrg-dRdx5Y-mrcgUE,CLP,CLP03,Total Claim Charge Amount,226.0
3,lpgxdL-dVX6Ky-gRRL2K-RN2KIK,4VECQI-4wkvQE-hl9HaM-cIOfEq,THRyGH-OMHBlj-AcWmBT-X3eblV,xlMbJ4-uD6zrg-dRdx5Y-mrcgUE,CLP,CLP04,Claim Payment Amount,132.0
4,lpgxdL-dVX6Ky-gRRL2K-RN2KIK,4VECQI-4wkvQE-hl9HaM-cIOfEq,THRyGH-OMHBlj-AcWmBT-X3eblV,xlMbJ4-uD6zrg-dRdx5Y-mrcgUE,CLP,CLP05,Patient Responsibility Amount,



5. **Service Line Items (SVC):**  `parser.TABLES['SERVICES']`
   
   Within each claim, service line items (`SVC` segments) detail individual healthcare services or procedures performed during the treatment. The line item data includes service codes, charges, allowed amounts, and any related adjustments. You can extract service line data as a Pandas DataFrame using `parser.TABLES['SERVICES']`. These can be linked to the claims table using the composite key `['ISA_ID', 'GS_ID', 'ST_ID', 'CLP_ID']`.

6. **Adjustments (CAS):**  `parser.TABLES['CAS']`

   Services often have adjustments (`CAS` segments), which represent reductions or additions to the claim/service amount based on specific reasons like contractual obligations, patient responsibility, or denials. The parser extracts all adjustments, grouping them by service level, and allows you to retrieve this data as a Pandas DataFrame via `parser.TABLES['CAS']`.

7. **References (REF):**  `parser.TABLES['REF']`

   The parser captures `REF` (Reference Identification) segments, which may contain reference information like procedure codes or authorization numbers. These references are extracted into a Pandas DataFrame using `parser.TABLES['REF']`.

8. **Identification (LQ):**  `parser.TABLES['LQ']`

   Service-level `LQ` segments contain service-specific qualifiers and codes. You can retrieve this information as a Pandas DataFrame via `parser.TABLES['LQ']`.

9. **Date/Time (DTM):**  `parser.TABLES['DTM']`

   The `DTM` segments for services capture date and time-related information (e.g., service dates, procedure dates). This data is available as a Pandas DataFrame using `parser.TABLES['DTM']`.

10. **Amounts (AMT):**  `parser.TABLES['AMT']`

    The `AMT` segments capture additional monetary amounts that are not part of the claim charges but can affect the final payment, such as patient responsibility amounts, deductible amounts, or other financial considerations. These amounts are recorded at various levels, and the parser organizes them into a Pandas DataFrame. You can access this data via `parser.TABLES['AMT']` to review the monetary details tied to the service or claim.

11. **Provider-Level Adjustments (PLB):**  `parser.TABLES['PLB']`

    `PLB` segments represent provider-level adjustments that affect the total payment but are unrelated to individual claims. These adjustments can be for reasons such as interest payments, corrections, or refunds. The parser extracts `PLB` adjustments and structures them in a Pandas DataFrame, accessible through `parser.TABLES['PLB']`. This table gives a clear view of all provider-level adjustments that impact payment reconciliation.



In [10]:
# Example Service CAS Data
parser.TABLES['CAS'].head(3)

Unnamed: 0,header_id,functional_group_id,statement_id,claim_id,service_id,cas_id,segment,field,name,value
0,lpgxdL-dVX6Ky-gRRL2K-RN2KIK,4VECQI-4wkvQE-hl9HaM-cIOfEq,THRyGH-OMHBlj-AcWmBT-X3eblV,xlMbJ4-uD6zrg-dRdx5Y-mrcgUE,ScOXv9-lhnprx-ShHHfN-fTE57w,nDRLg4-OSgUk0-ki1TD7-vuTR8I,CASCO,CAS01,Claim Adjustment Group Code,CO
1,lpgxdL-dVX6Ky-gRRL2K-RN2KIK,4VECQI-4wkvQE-hl9HaM-cIOfEq,THRyGH-OMHBlj-AcWmBT-X3eblV,xlMbJ4-uD6zrg-dRdx5Y-mrcgUE,ScOXv9-lhnprx-ShHHfN-fTE57w,nDRLg4-OSgUk0-ki1TD7-vuTR8I,CASCO,CAS02,Adjustment Reason Code,45
2,lpgxdL-dVX6Ky-gRRL2K-RN2KIK,4VECQI-4wkvQE-hl9HaM-cIOfEq,THRyGH-OMHBlj-AcWmBT-X3eblV,xlMbJ4-uD6zrg-dRdx5Y-mrcgUE,ScOXv9-lhnprx-ShHHfN-fTE57w,nDRLg4-OSgUk0-ki1TD7-vuTR8I,CASCO,CAS03,Adjustment Amount,21




## Contributing

Contributions are welcome! Feel free to submit pull requests or open issues.

1. Fork the repo.
2. Create your feature branch (`git checkout -b feature/my-feature`).
3. Commit your changes (`git commit -am 'Add some feature'`).
4. Push to the branch (`git push origin feature/my-feature`).
5. Open a pull request.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
