# ShEX shapes creation from Bioschemas YAML

* Authors: Leyla Garcia (1)
* (1) ZBMED Information Centre for Life Sciences, Cologne, Germany

* GitHub repository:  https://github.com/biotea/validation-shapes-bioschemas
* License: Apache 2.0

* Acknowledgements: This notebook was created during the NBDC / DBCLS BioHackathon 2019, we thank the organizer for their invitation to participate in this event. We also thank the Schemas group created during the event, special thanks to Jose Labra.

## Input
* Bioschemas YAML file 
* Example at https://github.com/biotea/validation-shapes-bioschemas/blob/master/journal.yaml

## Output
* ShEX shape
* Example at https://github.com/biotea/validation-shapes-bioschemas/blob/master/generatedJournal.shex

## Process
* Make sure journal.yaml is at the same path as this book
* This notebook works as follow
  * Load a YAML file generated from https://github.com/BioSchemas/bioschemas-goweb (or a compatible one created by any other means)
  * By now, only working with local files (ToDo: allow loading from URL, and from Bioschemas HTML pages in github.io)
  * Parse profile properties in order to generate shapes
  * Call main function parseProperties three times, for minimum, recommended and optional properties
  * Add partial shapes to profile final shape
* Disclaimer: We have tested this ShEX shapes creator with Biotea-Bioschemas profile for Journal, further testing and adjusts are needed. Please report any bugs via GitHub issues


## Trying out the shape validation
* Go to http://rdfshape.weso.es/validate
* Run the validator with the generated shapes and the input example, everything should pass


In [None]:
#Import libraries
import json
from yaml import load

In [None]:
#Load YAML file
stream = open('journal.yaml', 'r')
data = yaml.load(stream)
data

In [None]:
#Parse expected types and populate arrays for data and object types
def parseExpectedTypes(elem, exprDataType, exprObjType):
    for exType in elem['expected_types']:
        if exType == 'URL':
            exprDataType.append('@<URL>')
        elif exType == 'Text': 
            exprDataType.append('xsd:string')
        elif exType == 'Boolean':
            exprDataType.append('xsd:boolean')
        else:
            exprObjType.append(exType) 

#Parse object type properties to get information for the main shape and additional supporting shapes 
#(contatining all possible types when multiple are allowed for a property)
def parseObjProperties (exprObjType, addShapes):
    shape = ''
    if len(exprObjType) == 1:
        shape += ' {a [schema:' + exprObjType[0] + ']} OR IRI'
    elif len(exprObjType) > 0:
        separator = 'Or'
        exprObjTypeName = '<' + separator.join(exprObjType) + '>'
        shape += ' @' + exprObjTypeName
        shape += ' OR IRI'
        separator = ' schema:'
        exprObjType.insert(0, '')
        addShapes.append('\n' + exprObjTypeName + '{\n  rdf:type [' + separator.join(exprObjType) + ']\n}')
    return shape

#Parse properties corresponding to a particular group Minimum, Recommended or Optional
def parseProperties (profile, propList, marginality, symbolOne, symbolMany, addShapes):  
    shape = '\n<' + profile + marginality + '> {\n  rdf:type [' + profileType + '] ;'
    for elem in propList:
        shape += '\n  schema:' + elem['property']
        exprDataType = []
        exprObjType = []
        shapeObjType = ''
        
        parseExpectedTypes(elem, exprDataType, exprObjType)
        
        separator = ' OR '
        shape += ' ' + separator.join(exprDataType)

        if (len(exprDataType) > 0) and  (len(exprObjType) > 0):
            shape += ' OR'

        shape += parseObjProperties (exprObjType, addShapes)

        if elem['cardinality'] == 'ONE':
            shape += ' ' + symbolOne
        else:
            shape += ' ' + symbolMany
        
        shape += ' ;'
        
    shape += '\n}\n'  
    return shape


In [None]:
#Set up environment
profile = 'Journal' #this should be parametrized or so, info not in the YAML
profileType = 'schema:Periodical' #this should also be parametrized somehow, info not in the YAML

fullShape = 'PREFIX schema: <http://schema.org/> \n\
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> \n\
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> \n\
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> \n\
<URL> \n\
  xsd:string OR IRI\n\
'

#Parse minimum properties from the JSON array obtained from YAML file
minProp = [el for el in data if el['marginality'] == 'Minimum']
minAddShapes = []
minShape = parseProperties (profile, minProp, 'Minimum', '', '+', minAddShapes)
fullShape += minShape

#Parse recommended properties from the JSON array obtained from YAML file
recProp = [el for el in data if el['marginality'] == 'Recommended']
recAddShapes = []
recShape = parseProperties (profile, recProp, 'Recommended', '?', '*', recAddShapes)
fullShape += recShape

#Parse optional properties from the JSON array obtained from YAML file
optProp = [el for el in data if el['marginality'] == 'Optional']
optAddShapes = []
optShape = parseProperties (profile, optProp, 'Optional', '?', '*', optAddShapes)
fullShape += optShape

separator = '\n'
fullShape += separator.join(minAddShapes)
fullShape += separator.join(recAddShapes)
fullShape += separator.join(optAddShapes)

print(fullShape)