# UN/EDIFACT Message Reader

## Overview

### Requirements

- Python 3.6 or higher

### Features
- All encodings
- Version independent
- XML

### Functions

- parse_edi(data: bytes)                -> list
- make_edi(segments: list)              -> bytes
- parse_xml(root: ElementTree.Element)  -> list
- make_xml(segments: list)              -> ElementTree.Element
- pretty_xml(root: ElementTree.Element) -> str

### Experimental
- report(segments: list)                           -> str
- make_edi_xml(segments: list, sd: dict, ed: dict) -> ElementTree.Element

## EDI
Specifications can be found here http://www.unece.org

### Example message
Reading a binary edifact file

In [1]:
edi = open('order.edi', 'rb').read()
edi[:80]  # frist 80 bytes


b"UNA:+.? 'UNB+UNOY:4+INVALIDATORSTUDIO:1+BYTESREADER:1+20180630:1159+6002'UNH+123"

The decoding of an edifact message depends on the optional __UNB__-Segment. 

In this example __UNOY__ indicates __UTF-8__-encoding.

In [2]:
from edixml import ENCODINGS

ENCODINGS['UNOY']

{'ENCODING': 'utf8'}

In [3]:
edi.decode('utf8')[:80]

"UNA:+.? 'UNB+UNOY:4+INVALIDATORSTUDIO:1+BYTESREADER:1+20180630:1159+6002'UNH+123"

### Message-Syntax

The __UNB-Segment__ indicates the encoding and __Syntax-Version__ for the __Service-Segments__.

The type of the message is indicated in the __UNH-Segment__. <br>
In this example the version of the message is __D18A__ (Year 2018, 1st release) and the type of the message is __ORDERS__.


The __UNA-Segment__ indicates the _special characters_, for splitting the message in its __Segments__, __Dataelements__ and __Components__

- Each __Segment__ is identified by its three character __Segment Qualifier__ (UNA, UNB, UNH...) and ends by its __Segment-Terminator__ (')
- Each __Segment__ has __Dataelements__, separated by its __Dataelement-Separator__ (+)
- Each __Dataelement__ has __Components__, separated by its __Component-Separator__ (:)
- The __Decimal-Point-Character__ (.) defines the representation of __Numeric-Values__,

In [4]:
from edixml import parse_edi
segments = parse_edi(edi)
segments

[['UNA', [':', '+', '.', '?', ' ', "'"]],
 ['UNB',
  [['UNOY', '4'],
   ['INVALIDATORSTUDIO', '1'],
   ['BYTESREADER', '1'],
   ['20180630', '1159'],
   ['6002']]],
 ['UNH', [['123456'], ['ORDERS', 'D', '18A', 'UN', 'EAN008']]],
 ['BGM', [['220'], ['4711'], ['9']]],
 ['DTM', [['137', '20180630', '102']]],
 ['NAD', [['BY'], ['31-424-2022', '', '16']]],
 ['NAD', [['SU'], ['34-093-1588', '', '16']]],
 ['LIN', [['1'], ['1'], ['0764569104', 'IB']]],
 ['QTY', [['1', '25']]],
 ['FTX', [['AFM'], ['1'], [''], ["XPATH 2.0 PROGRAMMER'S REFERENCE"]]],
 ['LIN', [['2'], ['1'], ['0764569090', 'IB']]],
 ['QTY', [['1', '25']]],
 ['FTX', [['AFM'], ['1'], [''], ["XSLT 2.0 PROGRAMMER'S REFERENCE"]]],
 ['LIN', [['3'], ['1'], ['1861004656', 'IB']]],
 ['QTY', [['1', '16']]],
 ['FTX', [['AFM'], ['1'], [''], ['JAVA SERVER PROGRAMMING']]],
 ['LIN', [['4'], ['1'], ['0-19-501476-6', 'IB']]],
 ['QTY', [['1', '10']]],
 ['FTX', [['AFM'], ['1'], [''], ['TZUN TZU']]],
 ['UNS', [['S']]],
 ['CNT', [['2', '4']]],
 ['UNT'

### Indexing

In [5]:
segments[7]
['LIN', [['1'], ['1'], ['0764569104', 'IB']]]

['LIN', [['1'], ['1'], ['0764569104', 'IB']]]

In [6]:
seg, elements = segments[7]
seg

'LIN'

In [7]:
elements

[['1'], ['1'], ['0764569104', 'IB']]

###  Semantics - Code Table
For __each__ version there are:

- hundreds of message-code-tables
- hundreds of segment-code-tables
- hundrets of element-code-tables with over 10.000 different codes

The _full_ implementation of one message for one version was expected to take half a year.

- LIN - Segment-table: https://service.unece.org/trade/untdid/d18a/trsd/trsdlin.htm
- 7143 - Code-table: https://service.unece.org/trade/untdid/d18a/tred/tred7143.htm

In [8]:
isbns = [elements[2][0] for seg, elements in segments 
         if seg == 'LIN' and elements[2][1] == 'IB']
isbns

['0764569104', '0764569090', '1861004656', '0-19-501476-6']

### Formatting

In [9]:
from edixml import make_edi
edmoji = make_edi(segments, 
                  component_separator='✉',
                  dataelement_separator='☺',
                  decimal_mark='☣',
                  release_char='☎',
                  segment_terminator='❤',
                  with_newline=True)

print(edmoji.decode('utf8'))

UNA✉☺☣☎ ❤
UNB☺UNOY✉4☺INVALIDATORSTUDIO✉1☺BYTESREADER✉1☺20180630✉1159☺6002❤
UNH☺123456☺ORDERS✉D✉18A✉UN✉EAN008❤
BGM☺220☺4711☺9❤
DTM☺137✉20180630✉102❤
NAD☺BY☺31-424-2022✉✉16❤
NAD☺SU☺34-093-1588✉✉16❤
LIN☺1☺1☺0764569104✉IB❤
QTY☺1✉25❤
FTX☺AFM☺1☺☺XPATH 2.0 PROGRAMMER'S REFERENCE❤
LIN☺2☺1☺0764569090✉IB❤
QTY☺1✉25❤
FTX☺AFM☺1☺☺XSLT 2.0 PROGRAMMER'S REFERENCE❤
LIN☺3☺1☺1861004656✉IB❤
QTY☺1✉16❤
FTX☺AFM☺1☺☺JAVA SERVER PROGRAMMING❤
LIN☺4☺1☺0-19-501476-6✉IB❤
QTY☺1✉10❤
FTX☺AFM☺1☺☺TZUN TZU❤
UNS☺S❤
CNT☺2✉4❤
UNT☺22☺SSDD1❤
UNZ☺1☺6002❤


## XML

### Mapping to XML

In [10]:
from edixml import make_xml
xml = make_xml(segments)
type(xml)

xml.etree.ElementTree.Element

### Indexing

In [11]:
xml[7].tag

'LIN'

### Index-Semantics

In [12]:
isbns = [elements[2][0].text for elements in xml 
         if elements.tag == 'LIN' and elements[2][1].text == 'IB']
isbns

['0764569104', '0764569090', '1861004656', '0-19-501476-6']

### Formatting

In [13]:
from xml.etree import ElementTree

ElementTree.tostring(xml, encoding='utf8').decode('utf8')[:100]

"<?xml version='1.0' encoding='utf8'?>\n<EDIFACT><UNA>:+.? '</UNA><UNB><UNB0><UNB00>UNOY</UNB00><UNB01"

In [14]:
from edixml import pretty_xml

print(pretty_xml(xml)[:140])

<?xml version="1.0" ?>
<EDIFACT>
    <UNA>:+.? '</UNA>
    <UNB>
        <UNB0>
            <UNB00>UNOY</UNB00>
            <UNB01>4</UNB01>


## Mapping EDI/XML

In [15]:
from edixml import parse_xml

edi == make_edi(parse_xml(xml))


True

In [16]:
edi == make_edi(parse_xml(make_xml(parse_edi(edi))))

True

In [17]:
edi == make_edi(parse_xml(make_xml(parse_edi(edmoji))))

True

## Experimental - D18A with Service-Segments (Version 4, Release 2)

### Messages, Segments and Elements in JSON

In [18]:
import json

# The Service-Segments and Service-Elements
v42_sd = json.loads(open('V42-9735-10_service_segments.json').read())
v42_ed = json.loads(open('V42-9735-10_service_codes.json').read())

# The Segments, Elements and Messages
d18a_sd = json.loads(open('d18a_segments.json').read())
d18a_ed = json.loads(open('d18a_codes.json').read())
d18a_md = json.loads(open('d18a_messages.json').read())  # only description

sd = {**v42_sd, **d18a_sd}
ed = {**v42_ed, **d18a_ed}
md = {**d18a_md}

total_codes = sum([len(ed[code]['table']) if 'table' in ed[code] else 0 
                   for code in ed])
        
print(f"Version: D18A, Messages: {len(md)}, Segments: {len(sd)}, Codes: {total_codes}")


Version: D18A, Messages: 210, Segments: 190, Codes: 12092


### Segment-Definitions

In [19]:
sd['LIN']['name'], sd['LIN']['description'], sd['LIN']['table'][0]

('LINE ITEM',
 'Function: To identify a line item and configuration.',
 {'code': '1082',
  'mc': 'C',
  'name': 'LINE ITEM IDENTIFIER',
  'pos': '010',
  'repeat': 1,
  'representation': 'an..6'})

### Element Definitions

In [20]:
ed['7143']['name'], ed['7143']['table']['IB']

('Item type identification code',
 {'description': 'A unique number identifying a book.',
  'name': 'ISBN (International Standard Book Number)'})

### Message Definition

In [21]:
print(md['ORDERS']['description'][:1200], '...')

Pos     Tag Name                                        S   R
   
            HEADER SECTION   
   
00010   UNH Message header                              M   1     
00020   BGM Beginning of message                        M   1     
00030   DTM Date/time/period                            M   35    
00040   PAI Payment instructions                        C   1     
00050   ALI Additional information                      C   5     
00060   IMD Item description                            C   999   
00070   FTX Free text                                   C   99    
00080   GIR Related identification numbers              C   10    
   
00090       ---- Segment group 1  ------------------    C   9999-------------+
00100   RFF Reference                                   M   1                |
00110   DTM Date/time/period                            C   5----------------+
   
00120       ---- Segment group 2  ------------------    C   99---------------+
00130   NAD Name and address            

### Reporting
Helper Function to quickly translate an arbitrary message.

In [22]:
from edixml import report

print(report(segments, sd, ed))

UNB+UNOY:4+INVALIDATORSTUDIO:1+BYTESREADER:1+20180630:1159+6002'
----------------------------------------------------------------
Interchange header <UNB>
  SYNTAX IDENTIFIER (S001)
    Syntax identifier <UNOY> (0001) UN/ECE level Y
    Syntax version number <4> (0002) Version 4
  INTERCHANGE SENDER (S002)
    Interchange sender identification <INVALIDATORSTUDIO> (0004)
    Identification code qualifier <1> (0007) DUNS (Data Universal Numbering System)
  INTERCHANGE RECIPIENT (S003)
    Interchange recipient identification <BYTESREADER> (0010)
    Identification code qualifier <1> (0007) DUNS (Data Universal Numbering System)
  DATE AND TIME OF PREPARATION (S004)
    Date <20180630> (0017)
    Time <1159> (0019)
    Interchange control reference <6002> (0020)

UNH+123456+ORDERS:D:18A:UN:EAN008'
----------------------------------
MESSAGE HEADER <UNH>
    Message reference number <123456> (0062)
  MESSAGE IDENTIFIER (S009)
    Message type <ORDERS> (0065) Purchase order message
    Messa

### Descriptive EDI-XML

In [23]:
from edixml import make_edi_xml

edifact_xml = make_edi_xml(segments, sd, ed)
print(pretty_xml(edifact_xml)[:300], '...')

<?xml version="1.0" ?>
<EDIFACT>
    <UNA>:+.? '</UNA>
    <UNB description="Function: To identify an interchange" name="Interchange header">
        <UNB0 code="S001" mc="M" name="SYNTAX IDENTIFIER" pos="10" repeat="1">
            <UNB00 code="0001" description="ISO 10646-1 octet without code exte ...


### Mapping

In [24]:
edi == make_edi(parse_xml(edifact_xml))

True