# GS1 Digital Link URI Verifier for EPCIS
### Purpose of this notebook: 
* Basis for discussion in EPCIS/CBV 2.0 MSWG
* Potential starting point of an open source artefact to help in applying constrained GS1 Digital Link URIs for populating the *what* dimension (i.e. epcList and quantityList) of EPCIS events 

### Functionality:
* Checks whether a list of GS1 DL URIs conforms to the constrained syntax as defined in CBV 2.0
* Specifically:
 * checks if GS1 DL URIs only comprise the primary identifier at the lowest, i.e. most precise level so that they correspond to the EPC URI/EPC Class URI schemes as defined in the EPC Tag Data Standard
 * accepts any user domain/sub-domain
 * checks, if a GS1 DL URI includes a GTIN, the latter is represented in its GTIN-14 format so that it is consistent with the definition of GS1 Application Identifier '01' 
* Input: list of Digital Link URIs (as they are intended to populate the epcList/quantityList)
* Output: either 'True' (if conformant) or 'false' (if not) for each list element
* Supported Keys/Compound Keys: 
 * GTIN, GTIN + Lot (LGTIN), GTIN + Serial (SGTIN)
 * SSCC 
 * UPUI

### Status: DRAFT (for discussion purposes)

In [3]:
import re

In [4]:
dlURIList = [
    'https://id.gs1.org/00/340123451111111111',
    'https://id.gs1.org/01/04012345123456',
    'https://id.gs1.org/01/04012345123456/10/ABC',
    'https://example.com/01/04150567890128/21/987654',
    'https://id.gs1.org/01/061414155557',
    'https://id.gs1.org/01/12345670',
    'https://id.gs1.org/sscc/340123451111111111'
    

]

In [5]:
validationList = []
for j in dlURIList:
    mO = re.match(r'''https?:(//((([^\/?#]*)@)?([^\/?#:]*)(:([^\/?#]*))?))?([^?#]*)/( # domain/sub-domain
                    (00/\d{18}$)| # SSCC
                    (01/\d{14}$)| # GTIN 
                    # Allowed chars in serial and lot as of GS1 DL standard: " / % / - / . / 0-9 / A-Z / _ / a-z
                    (01/\d{14}/10/([\x22\x25\x2d\x2E\x30-\x39\x41-\x5A\x5F\x61-\x7A]{0,20})$)| # LGTIN
                    (01/\d{14}/21/([\x22\x25\x2d\x2E\x30-\x39\x41-\x5A\x5F\x61-\x7A]{0,20})$) # SGTIN
                    (01/\d{14}/235/([\x22\x25\x2d\x2E\x30-\x39\x41-\x5A\x5F\x61-\x7A]{0,28})$ # UPUI
                    
                    
                    ) ''', j, re.VERBOSE)
    if mO:
        validationList.append('True')
    else:
        validationList.append('False')
print (validationList)

['True', 'True', 'True', 'True', 'False', 'False', 'False']


Invalid examples (i.e. those returning 'False') cover the following cases:
* GTIN not formatted as GTIN-14 (i.e. coresponding to definition of GS1 Application Identifier (AI) '01')
* Applying short names instead of AI equivalents
* GTIN-based GS1 DL URI comprising consumer product variant (AI '22') 
* URI containing characters that need to be percent-encoded 
* GS1 DL URI ending with a trailing slash 