From edd4b080b02b86df6defdf341ef31c70f9207a79 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Herv=C3=A9=20M=C3=A9nager?= Date: Thu, 13 Oct 2016 08:44:37 +0200 Subject: [PATCH 01/10] update next_id --- EDAM_dev.owl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/EDAM_dev.owl b/EDAM_dev.owl index cd9497c..ec4f2d9 100644 --- a/EDAM_dev.owl +++ b/EDAM_dev.owl @@ -41,7 +41,7 @@ operations "EDAM operations" Bioinformatics operations, data types, formats, identifiers and topics EDAM http://edamontology.org/ "EDAM relations and concept properties" - 3770 + 3771 application/rdf+xml 12.05.2016 18:23 GMT EDAM_data http://edamontology.org/data_ "EDAM types of data" From 46a09befcb41f2e483001c0c37c28008cbc45c67 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Herv=C3=A9=20M=C3=A9nager?= Date: Thu, 13 Oct 2016 10:04:19 +0200 Subject: [PATCH 02/10] Peptide-spectrum-matching is not a synonym of Protein identification Duplicate synonyms detected by edamxpathvalidator: Error : multiple concepts with the same namespace have the same synonym 'Peptide-spectrum-matching' 'Peptide identification' (http://edamontology.org/operation_3631) -> 'Peptide identification' (http://edamontology.org/operation_3631) --- EDAM_dev.owl | 1 - 1 file changed, 1 deletion(-) diff --git a/EDAM_dev.owl b/EDAM_dev.owl index ec4f2d9..a5b0084 100644 --- a/EDAM_dev.owl +++ b/EDAM_dev.owl @@ -46515,7 +46515,6 @@ Trim sequences (typically from an automated DNA sequencer) to remove sequence-sp - Peptide-spectrum-matching Protein inference Identification of protein, for example from one or more peptide identifications by tandem mass spectrometry. 1.16 From 71ece5928166c8aa207d024f8b464a11ffe7638f Mon Sep 17 00:00:00 2001 From: matuskalas Date: Thu, 13 Oct 2016 15:05:18 +0200 Subject: [PATCH 03/10] Minor polish of concept properties --- EDAM_dev.owl | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/EDAM_dev.owl b/EDAM_dev.owl index a5b0084..ef72650 100644 --- a/EDAM_dev.owl +++ b/EDAM_dev.owl @@ -125,6 +125,7 @@ File extension 'File extension' concept property ('file_extension' metadata tag) lists examples of usual file extensions of formats. Separated by bar ('|'), without a dot ('.') prefix, preferrably not all capital characters. + N.B.: File extensions that are not correspondigly defined at http://filext.com are recorded in EDAM only if not in conflict with http://filext.com, and/or unique and usual within life-science computing. concept_properties true @@ -135,7 +136,7 @@ isdebtag - When 'true', the term has been proposed or is supported within Debian Med as a tag. + When 'true', the concept has been proposed or is supported within Debian as a tag. concept_properties true @@ -154,9 +155,9 @@ - + - + @@ -33267,9 +33268,9 @@ experiments employing a combination of technologies. YAML Ain't Markup Language YAML (YAML Ain't Markup Language) is a human-readable tree-structured data serialisation language. yaml|yml - - + + @@ -44840,14 +44841,14 @@ Trim sequences (typically from an automated DNA sequencer) to remove sequence-sp Format recognition 'Format recognition' is not a bioinformatics-specific operation, but of great relevance in bioinformatics. Should be removed from EDAM if/when captured satisfactorily in a suitable domain-generic ontology. Format inference - + The has_input "Data" (data_0006) may cause visualisation or other problems although ontologically correct. But on the other hand it may be useful to distinguish from nullary operations without inputs. - - + + From da34431808d3158c4a0b69dc3214b9bc0386b845 Mon Sep 17 00:00:00 2001 From: matuskalas Date: Thu, 13 Oct 2016 15:39:52 +0200 Subject: [PATCH 04/10] Added UniProtKB formats (fixes #221) --- EDAM_dev.owl | 32 +++++++++++++++++++++++++++++--- 1 file changed, 29 insertions(+), 3 deletions(-) diff --git a/EDAM_dev.owl b/EDAM_dev.owl index ef72650..6932f06 100644 --- a/EDAM_dev.owl +++ b/EDAM_dev.owl @@ -41,7 +41,7 @@ operations "EDAM operations" Bioinformatics operations, data types, formats, identifiers and topics EDAM http://edamontology.org/ "EDAM relations and concept properties" - 3771 + 3772 application/rdf+xml 12.05.2016 18:23 GMT EDAM_data http://edamontology.org/data_ "EDAM types of data" @@ -33354,16 +33354,42 @@ experiments employing a combination of technologies. - UniProt XML format + UniProtKB XML + UniProt XML + UniProtKB XML format + UniProt XML format + 1.16 - XML sequence format used for UniProt entries. + UniProtKB XML sequence features format is an XML format available for downloading UniProt entries. + + + + + + UniProtKB RDF + UniProt RDF + UniProtKB RDF format + UniProt RDF format + UniProtKB RDF/XML + UniProt RDF/XML + UniProtKB RDF/XML format + UniProt RDF/XML format + + + 1.16 + UniProtKB RDF sequence features format is an RDF format (RDF/XML) available for downloading UniProt entries. + + + + + From 55ccbd5c136758509c0466f7bd2c4c3bb117a739 Mon Sep 17 00:00:00 2001 From: matuskalas Date: Thu, 13 Oct 2016 23:36:30 +0200 Subject: [PATCH 05/10] Added and refined some formats --- EDAM_dev.owl | 270 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 258 insertions(+), 12 deletions(-) diff --git a/EDAM_dev.owl b/EDAM_dev.owl index 6932f06..8dcff5e 100644 --- a/EDAM_dev.owl +++ b/EDAM_dev.owl @@ -41,7 +41,7 @@ operations "EDAM operations" Bioinformatics operations, data types, formats, identifiers and topics EDAM http://edamontology.org/ "EDAM relations and concept properties" - 3772 + 3777 application/rdf+xml 12.05.2016 18:23 GMT EDAM_data http://edamontology.org/data_ "EDAM types of data" @@ -29507,11 +29507,12 @@ - Format (typed) + Format (by type of data) + Format (typed) This concept exists only to assist EDAM maintenance and navigation in graphical browsers. It does not add semantic information. The concept branch under 'Format (typed)' provides an alternative organisation of the concepts nested under the other top-level branches ('Binary', 'HTML', 'RDF', 'Text' and 'XML'. All concepts under here are already included under those branches. beta12orEarlier - A broad class of format distinguished by the scientific nature of the data that is identified. + A placeholder concept for visual navigation by dividing data formats by the content of the data that is represented. @@ -29521,7 +29522,20 @@ - BioXSD + BioXSD (XML) + BioXSD + BioXSD format + BioXSD data model + BioXSD/GTrack + BioXSD|GTrack + BioXSD|BioJSON|BioYAML + BioXSD in XML + BioXSD XML + BioXSD XML format + BioXSD in XML format + beta12orEarlier + BioXSD schema-based XML format of sequence-based data and some other common data - sequence records, alignments, feature records, references to resources, and more - optimised for integrative bioinformatics, Web services, and object-oriented programming. + 'BioXSD' belongs to the 'BioXSD|GTrack' ecosystem of generic formats. 'BioXSD in XML' is the XML format based on the common, unified 'BioXSD data model', a.k.a. 'BioXSD|BioJSON|BioYAML'. @@ -29545,13 +29559,23 @@ - BioXSD XML format - beta12orEarlier - BioXSD XML format of basic bioinformatics types of data (sequence records, alignments, feature records, references to resources, and more). + + + + + + + + + + + + + @@ -30718,10 +30742,18 @@ GTrack + BioXSD/GTrack + BioXSD|GTrack + GTrack ecosystem of formats + GTrack|GSuite|BTrack + GTrack|BTrack|GSuite + GTrack format + 1.0 - GTrack is an optimised tabular format for genome/sequence feature tracks unifying the power of other tabular formats (e.g. GFF3, BED, WIG). + GTrack is a generic and optimised tabular format for genome or sequence feature tracks. GTrack unifies the power of other track formats (e.g. GFF3, BED, WIG), and while optimised in size, adds more flexibility, customisation, and automation ("machine understandability"). + 'GTrack' belongs to the 'BioXSD|GTrack' ecosystem of generic formats, and particular to its subset, the 'GTrack ecosystem' (GTrack, GSuite, BTrack). 'GTrack' is the tabular format for representing features of sequences and genomes. @@ -31449,6 +31481,7 @@ + @@ -33284,9 +33317,9 @@ experiments employing a combination of technologies. 1.16 Tabular data represented as values in a text file delimited by some character. - Tabular format + Tabular format Delimiter-separated values - https://en.wikipedia.org/wiki/Delimiter-separated_values + @@ -33299,9 +33332,11 @@ experiments employing a combination of technologies. CSV Comma-separated values - http://filext.com/file-extension/CSV + csv + 1.16 - http://www.iana.org/assignments/media-types/text/csv + + Tabular data represented as comma-separated values in a text file. @@ -33390,6 +33425,217 @@ experiments employing a combination of technologies. + + + + + + BioJSON (BioXSD) + BioXSD + BioXSD format + BioXSD data model + BioXSD/GTrack + BioXSD|GTrack + BioXSD|BioJSON|BioYAML + BioXSD in JSON format + BioXSD in JSON + BioXSD JSON format + BioXSD JSON + BioXSD BioJSON format + BioXSD BioJSON + 1.16 + BioJSON is a BioXSD schema-based JSON format of sequence-based data and some other common data - sequence records, alignments, feature records, references to resources, and more - optimised for integrative bioinformatics, web applications and APIs, and object-oriented programming. + Work in progress. 'BioXSD' belongs to the 'BioXSD|GTrack' ecosystem of generic formats. 'BioJSON' is the JSON format based on the common, unified 'BioXSD data model', a.k.a. 'BioXSD|BioJSON|BioYAML'. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + BioYAML + BioXSD + BioXSD format + BioXSD data model + BioXSD/GTrack + BioXSD|GTrack + BioXSD|BioJSON|BioYAML + BioXSD in YAML format + BioXSD in YAML + BioXSD YAML format + BioXSD YAML + BioXSD BioYAML format + BioXSD BioYAML + BioYAML format + 1.16 + BioYAML is a BioXSD schema-based YAML format of sequence-based data and some other common data - sequence records, alignments, feature records, references to resources, and more - optimised for integrative bioinformatics, web APIs, human readability and editting, and object-oriented programming. + Work in progress. 'BioXSD' belongs to the 'BioXSD|GTrack' ecosystem of generic formats. 'BioYAML' is the YAML format based on the common, unified 'BioXSD data model', a.k.a. 'BioXSD|BioJSON|BioYAML'. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + BioJSON (Jalview) + BioJSON format (Jalview) + Jalview BioJSON format + Jalview BioJSON + JSON format (Jalview) + JSON (Jalview) + Jalview JSON format + Jalview JSON + 1.16 + BioJSON. + + + + + + + + + + + + + + + + + + + + + + + + + + + GSuite + BioXSD/GTrack + BioXSD|GTrack + GTrack ecosystem of formats + GTrack|GSuite|BTrack + GTrack|BTrack|GSuite + GSuite format + + + 1.16 + GSuite is a tabular format for collections of genome or sequence feature tracks, suitable for integrative multi-track analysis. GSuite contains links to genome/sequence tracks, with additional metadata. + 'GSuite' belongs to the 'BioXSD|GTrack' ecosystem of generic formats, and particular to its subset, the 'GTrack ecosystem' (GTrack, GSuite, BTrack). 'GSuite' is the tabular format for an annotated collection of individual GTrack files. + + + + + + + + + + + + + + BTrack + BioXSD/GTrack + BioXSD|GTrack + GTrack ecosystem of formats + GTrack|GSuite|BTrack + GTrack|BTrack|GSuite + BTrack format + + + + + 1.16 + BTrack is an HDF5-based binary format for genome or sequence feature tracks and their collections, suitable for integrative multi-track analysis. BTrack is a binary, compressed alternative to the GTrack and GSuite formats. + 'BTrack' belongs to the 'BioXSD|GTrack' ecosystem of generic formats, and particular to its subset, the 'GTrack ecosystem' (GTrack, GSuite, BTrack). 'BTrack' is the binary, optionally compressed HDF5-based version of the GTrack and GSuite formats. + + + + From 3da1d083d356bde2138d94868f859e83eae6ac17 Mon Sep 17 00:00:00 2001 From: matuskalas Date: Thu, 13 Oct 2016 23:49:08 +0200 Subject: [PATCH 06/10] Cripled duplicate broad synonyms --- EDAM_dev.owl | 53 +++++++++++++++++++++++++++-------------------------- 1 file changed, 27 insertions(+), 26 deletions(-) diff --git a/EDAM_dev.owl b/EDAM_dev.owl index 8dcff5e..0113f6f 100644 --- a/EDAM_dev.owl +++ b/EDAM_dev.owl @@ -30742,11 +30742,11 @@ GTrack - BioXSD/GTrack - BioXSD|GTrack + BioXSD/GTrack GTrack + BioXSD|GTrack GTrack GTrack ecosystem of formats - GTrack|GSuite|BTrack - GTrack|BTrack|GSuite + GTrack|GSuite|BTrack GTrack + GTrack|BTrack|GSuite GTrack GTrack format @@ -33431,12 +33431,12 @@ experiments employing a combination of technologies. BioJSON (BioXSD) - BioXSD - BioXSD format - BioXSD data model - BioXSD/GTrack - BioXSD|GTrack - BioXSD|BioJSON|BioYAML + BioXSD BioJSON + BioXSD BioJSON format + BioJSON (BioXSD data model) + BioXSD/GTrack BioJSON + BioXSD|GTrack BioJSON + BioXSD|BioJSON|BioYAML BioJSON BioXSD in JSON format BioXSD in JSON BioXSD JSON format @@ -33494,12 +33494,13 @@ experiments employing a combination of technologies. BioYAML - BioXSD - BioXSD format - BioXSD data model - BioXSD/GTrack - BioXSD|GTrack - BioXSD|BioJSON|BioYAML + BioXSD BioYAML + BioXSD BioYAML format + BioYAML (BioXSD data model) + BioYAML (BioXSD) + BioXSD/GTrack BioYAML + BioXSD|GTrack BioYAML + BioXSD|BioJSON|BioYAML BioYAML BioXSD in YAML format BioXSD in YAML BioXSD YAML format @@ -33594,11 +33595,11 @@ experiments employing a combination of technologies. GSuite - BioXSD/GTrack - BioXSD|GTrack - GTrack ecosystem of formats - GTrack|GSuite|BTrack - GTrack|BTrack|GSuite + BioXSD/GTrack GSuite + BioXSD|GTrack GSuite + GSuite (GTrack ecosystem of formats) + GTrack|GSuite|BTrack GSuite + GTrack|BTrack|GSuite GSuite GSuite format @@ -33619,11 +33620,11 @@ experiments employing a combination of technologies. BTrack - BioXSD/GTrack - BioXSD|GTrack - GTrack ecosystem of formats - GTrack|GSuite|BTrack - GTrack|BTrack|GSuite + BioXSD/GTrack BTrack + BioXSD|GTrack BTrack + BTrack (GTrack ecosystem of formats) + GTrack|GSuite|BTrack BTrack + GTrack|BTrack|GSuite BTrack BTrack format From 95ca81d2dc502aca045d7ce009c160245dae3846 Mon Sep 17 00:00:00 2001 From: matuskalas Date: Fri, 14 Oct 2016 09:07:06 +0200 Subject: [PATCH 07/10] Minor polishing of format synonyms --- EDAM_dev.owl | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/EDAM_dev.owl b/EDAM_dev.owl index 0113f6f..cc0ef01 100644 --- a/EDAM_dev.owl +++ b/EDAM_dev.owl @@ -29531,6 +29531,7 @@ BioXSD|BioJSON|BioYAML BioXSD in XML BioXSD XML + BioXSD+XML BioXSD XML format BioXSD in XML format beta12orEarlier @@ -33412,14 +33413,14 @@ experiments employing a combination of technologies. UniProt RDF UniProtKB RDF format UniProt RDF format - UniProtKB RDF/XML - UniProt RDF/XML - UniProtKB RDF/XML format - UniProt RDF/XML format + UniProtKB RDF/XML + UniProt RDF/XML + UniProtKB RDF/XML format + UniProt RDF/XML format 1.16 - UniProtKB RDF sequence features format is an RDF format (RDF/XML) available for downloading UniProt entries. + UniProtKB RDF sequence features format is an RDF format available for downloading UniProt entries (in RDF/XML). @@ -33437,12 +33438,12 @@ experiments employing a combination of technologies. BioXSD/GTrack BioJSON BioXSD|GTrack BioJSON BioXSD|BioJSON|BioYAML BioJSON + BioJSON format (BioXSD) BioXSD in JSON format BioXSD in JSON BioXSD JSON format BioXSD JSON - BioXSD BioJSON format - BioXSD BioJSON + BioXSD+JSON 1.16 BioJSON is a BioXSD schema-based JSON format of sequence-based data and some other common data - sequence records, alignments, feature records, references to resources, and more - optimised for integrative bioinformatics, web applications and APIs, and object-oriented programming. Work in progress. 'BioXSD' belongs to the 'BioXSD|GTrack' ecosystem of generic formats. 'BioJSON' is the JSON format based on the common, unified 'BioXSD data model', a.k.a. 'BioXSD|BioJSON|BioYAML'. @@ -33498,6 +33499,7 @@ experiments employing a combination of technologies. BioXSD BioYAML format BioYAML (BioXSD data model) BioYAML (BioXSD) + BioYAML format (BioXSD) BioXSD/GTrack BioYAML BioXSD|GTrack BioYAML BioXSD|BioJSON|BioYAML BioYAML @@ -33505,8 +33507,7 @@ experiments employing a combination of technologies. BioXSD in YAML BioXSD YAML format BioXSD YAML - BioXSD BioYAML format - BioXSD BioYAML + BioXSD+YAML BioYAML format 1.16 BioYAML is a BioXSD schema-based YAML format of sequence-based data and some other common data - sequence records, alignments, feature records, references to resources, and more - optimised for integrative bioinformatics, web APIs, human readability and editting, and object-oriented programming. From fb9ca819c856e8ca8d3c0ff007902ad1b01b8bb4 Mon Sep 17 00:00:00 2001 From: matuskalas Date: Fri, 14 Oct 2016 14:06:11 +0200 Subject: [PATCH 08/10] Added format attributes --- EDAM_dev.owl | 34 ++++++++++++++++++++++++++++++---- 1 file changed, 30 insertions(+), 4 deletions(-) diff --git a/EDAM_dev.owl b/EDAM_dev.owl index cc0ef01..64827a2 100644 --- a/EDAM_dev.owl +++ b/EDAM_dev.owl @@ -130,6 +130,20 @@ true + + + + + + Information standard + Minimum information standard + Minimum information checklist + 'Information standard' trailing modifier (qualifier, 'information_standard') of 'xref' links of 'Format' concepts. When 'true', the link is pointing to an information standard supported by the given data format. + "Supported by the given data format" here means, that the given format enables representation of data that satisfies the information standard. + true + concept_properties + + @@ -141,10 +155,10 @@ true - - + + - + Media type MIME type @@ -153,6 +167,18 @@ concept_properties + + + + + + Organisation + Organization + 'Organisation' trailing modifier (qualifier, 'organisation') of 'xref' links of 'Format' concepts. When 'true', the link is pointing to an organisation that developed, standardised, and maintains the given data format. + true + concept_properties + + @@ -33514,7 +33540,7 @@ experiments employing a combination of technologies. Work in progress. 'BioXSD' belongs to the 'BioXSD|GTrack' ecosystem of generic formats. 'BioYAML' is the YAML format based on the common, unified 'BioXSD data model', a.k.a. 'BioXSD|BioJSON|BioYAML'. - + From b4739514316cc9ff78face7220cb29fae0727365 Mon Sep 17 00:00:00 2001 From: matuskalas Date: Fri, 14 Oct 2016 14:36:48 +0200 Subject: [PATCH 09/10] Fixed MHTML format --- EDAM_dev.owl | 24 ++++++++++++++++++++---- 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/EDAM_dev.owl b/EDAM_dev.owl index 64827a2..d41f07f 100644 --- a/EDAM_dev.owl +++ b/EDAM_dev.owl @@ -31804,12 +31804,28 @@ - MHT - MIME HTML format for Web pages, which can include external resources, including images, Flash animations and so on. + MHTML + MHT + MIME HTML + MHTML format + MHT format + MIME HTML format + HTML email format + HTML email message format + MIME multipart + MIME multipart format + MIME multipart message + MIME multipart message format + MIME HTML format for Web pages, which can include external resources, including images, Flash animations and so on. + MHTML is not strictly an HTML format, it is encoded as an HTML email message (although with multipart/related instead of multipart/alternative). It, however, contains the main HTML block as its core, and thus it is for practical reasons incuded in EDAM as a specialisation of 'HTML'. - EMBL entry format wrapped in HTML elements. 1.9 - MHTML + mhtml|mht|eml + + + + + From 503bf2edc1d2e28826107304ce67cb89af620580 Mon Sep 17 00:00:00 2001 From: matuskalas Date: Fri, 14 Oct 2016 22:48:35 +0200 Subject: [PATCH 10/10] Updates of formats --- EDAM_dev.owl | 39 +++++++++++++++++++++++++++++++++++---- 1 file changed, 35 insertions(+), 4 deletions(-) diff --git a/EDAM_dev.owl b/EDAM_dev.owl index d41f07f..7e0345b 100644 --- a/EDAM_dev.owl +++ b/EDAM_dev.owl @@ -113,7 +113,7 @@ Example 'Example' concept property ('example' metadata tag) lists examples of valid values of types of identifiers (accessions). Applicable to some other types of data, too. true - Separated by bar ('|'). + Separated by bar ('|'). For more complex data and data formats, it can be a link to a website with examples, instead. concept_properties @@ -207,6 +207,17 @@ true + + + + + + Ontology used + 'Ontology used' concept property ('ontology_used' metadata tag) of format concepts links to a domain ontology that is used inside the given data format, or contains a note about ontology use within the format. + concept_properties + true + + @@ -27558,8 +27569,9 @@ Generic Feature Format version 3 (GFF3) of sequence features. - - + + + @@ -29600,6 +29612,13 @@ + + + + + + + Any ontology allowed, none mandatory. Preferrably with URIs but URIs are not mandatory. Non-ontology terms are also allowed as the last resort in case of a lack of suitable ontology. @@ -30783,6 +30802,11 @@ 'GTrack' belongs to the 'BioXSD|GTrack' ecosystem of generic formats, and particular to its subset, the 'GTrack ecosystem' (GTrack, GSuite, BTrack). 'GTrack' is the tabular format for representing features of sequences and genomes. + + + + + @@ -33526,6 +33550,8 @@ experiments employing a combination of technologies. + + @@ -33591,6 +33617,8 @@ experiments employing a combination of technologies. + + @@ -33651,7 +33679,10 @@ experiments employing a combination of technologies. 'GSuite' belongs to the 'BioXSD|GTrack' ecosystem of generic formats, and particular to its subset, the 'GTrack ecosystem' (GTrack, GSuite, BTrack). 'GSuite' is the tabular format for an annotated collection of individual GTrack files. - + + + +