Permalink
Browse files

Rename definitions.yaml to airr-schema.yaml per #66. (#124)

* Rename definitions.yaml to airr-schema.yaml per #66.
* Updated docs to denoted Alignment spec as experimental, removed metadata information, and some xrefs.
  • Loading branch information...
javh committed Apr 20, 2018
1 parent 5ad6857 commit 4cd037cd6384770289e8bd0fb851550f5ddb5ed8
@@ -229,7 +229,7 @@ def __getattr__(cls, name): return MagicMock()


# Load data for schemas
with open(os.path.abspath('../specs/definitions.yaml')) as ip:
with open(os.path.abspath('../specs/airr-schema.yaml')) as ip:
airr_schema = yaml.load(ip, Loader=yamlordereddictloader.Loader)
html_context = {'airr_schema': airr_schema}

@@ -1,12 +1,13 @@
.. _AlignmentSchema:

Alignment Schema
===========================
Alignment Schema (Experimental)
===============================

See the formatting overview for details on how to structure this data.
See the :ref:`format overview <DataRepresentations>` for details on
how to structure this data.

CIGAR strings must use hard/softclipping to align the full read to the full
germline reference segment.
Note, this schema definition is still experimental and should not be
considered final.

**Fields**

@@ -7,71 +7,55 @@ Data Representations
:maxdepth: 3
:caption: Schema

alignments
rearrangements
Rearrangements Schema <rearrangements>
Alignment Schema (Experimental) <alignments>


Data for ``Rearrangement`` objects are stored as rows in a *tab-delimited* file and
should be compatible with any TSV reader. Metadata is stored as a companion JSON
or YAML file according to the MiAIRR specifications.

Data for ``Rearrangement`` or ``Alignment`` objects are stored as rows in a
*tab-delimited* file and should be compatible with any TSV reader.

**Encoding**

The file should be encoded as ASCII or UTF-8. Everything is case-sensitive.


**CSV dialect**

The record separator is a newline ``\n`` and the field separator is a tab ``\t``.
Fields or data should not be quoted. A header line with the AIRR-specified column
names is always required.


**Coordinate numbering**

To minimize ambiguity of locations/annotations, all sequence coordinates use
Python-style semantics for locations and intervals. This means 0-indexed coords
with half-open intervals. See `this example <https://stackoverflow.com/a/509297/510187>`__
for additional clarity.


**Boolean values**

Boolean values must be encoded as ``T`` for true and ``F`` for false.


**Null values**

*All fields can be null.* (Even for columns that are described as
"required".) This should be encoded as an empty string.


**File names**

AIRR-formatted data files should end with ``.tsv``. If any metadata is
incorporated into the filename, consider including it in the accompanying
metadata file as well. Accompanying metadata files should end in ``.meta.json`` or
``.meta.yaml``.

AIRR-formatted data files should end with ``.tsv``.

**Identifiers/illegal characters**

Data must not contain tab or newline characters. Data should avoid ``#`` and quote
characters, as the result may be implementation-dependent.


**Structure**

The data file has 2 sections in this order:

1. Header (single line with column names)
2. Data (one record per line)

The metadata file should be readable as a JSON or YAML object according to the
MiAIRR spec.


**Header line**

A single line containing the column names (and also specifying field order).
@@ -3,7 +3,8 @@
Rearrangement Schema
===============================

See the formatting overview for details on how to structure this data.
See the :ref:`format overview <DataRepresentations>` for details on how
to structure this data.

**"Junction" versus "CDR3"**

@@ -14,7 +15,6 @@ and ``junction_aa`` fields which represent the extracted sequence include the tw
conserved residues, while the coordinate fields (``cdr3_start`` and ``cdr3_end``)
exclude them.


**Fields**

:download:`Download as TSV <../_downloads/Rearrangement.tsv>`.
@@ -85,7 +85,7 @@ setMethod("$",
#' @export
load_schema <- function(definition) {
# Load schema from yaml file
spec_file <- system.file("extdata", "definitions.yaml", package="airr")
spec_file <- system.file("extdata", "airr-schema.yaml", package="airr")
spec_list <- yaml.load_file(spec_file)

# Load definition
@@ -41,7 +41,7 @@ def __init__(self, definition):
airr.schema.Schema : schema object.
"""
# Load object definition
with resource_stream(__name__, 'specs/definitions.yaml') as f:
with resource_stream(__name__, 'specs/airr-schema.yaml') as f:
spec = yaml.load(f, Loader=yamlordereddictloader.Loader)

try:
File renamed without changes.
@@ -14,9 +14,9 @@
#spec_files = {basename(f): f for f in glob('specs/*.yaml')}
#py_files = {basename(f): f for f in glob('lang/python/airr/specs/*.yaml')}
#r_files = {basename(f): f for f in glob('lang/R/inst/extdata/*.yaml')}
spec_files = {basename(f): f for f in glob('specs/definitions.yaml')}
py_files = {basename(f): f for f in glob('lang/python/airr/specs/definitions.yaml')}
r_files = {basename(f): f for f in glob('lang/R/inst/extdata/definitions.yaml')}
spec_files = {basename(f): f for f in glob('specs/airr-schema.yaml')}
py_files = {basename(f): f for f in glob('lang/python/airr/specs/airr-schema.yaml')}
r_files = {basename(f): f for f in glob('lang/R/inst/extdata/airr-schema.yaml')}

# Check python package specs
if set(spec_files.keys()) != set(py_files.keys()):
@@ -22,7 +22,7 @@
tsv_data = list(csv.DictReader(ip, dialect='excel-tab'))


with open('specs/definitions.yaml', 'r') as ip:
with open('specs/airr-schema.yaml', 'r') as ip:
yaml_data = yaml.load(ip)


@@ -39,7 +39,7 @@
failed = True


# check for differences in fields between specs/definitions.yaml and
# check for differences in fields between specs/airr-schema.yaml and
# AIRR_Minimal_Standard_Data_Elements.tsv
for dataset in miairr_dataset_to_api_object.keys():
api_object = miairr_dataset_to_api_object[dataset]
@@ -48,7 +48,7 @@
if row['MiAIRR data set / subset'] == dataset]
yaml_object = yaml_data.get(api_object, None)
if not yaml_object:
print(f'{api_object} not found in definitions.yaml.\n', file=sys.stderr)
print(f'{api_object} not found in airr-schema.yaml.\n', file=sys.stderr)
failed = True
continue
yaml_fields = [property for property in yaml_object['properties'] if yaml_object['properties'][property].get('x-miairr')]
@@ -67,7 +67,7 @@
# if yaml_data[miairr_api_object]['discriminator'] == 'MiAIRR':
# airr_api_object = miairr_api_object.split('_')[1]
# if airr_api_object not in yaml_data:
# print(f'{airr_api_object} corresponding to {miairr_api_object} not found in definitions.yaml', file=sys.stderr)
# print(f'{airr_api_object} corresponding to {miairr_api_object} not found in airr-schema.yaml', file=sys.stderr)
# failed = True
# continue

0 comments on commit 4cd037c

Please sign in to comment.