Skip to content
Permalink
Browse files

Merge branch 'master' into issue-255_2

  • Loading branch information
schristley committed Nov 26, 2019
2 parents 4fa8ae9 + 6bff5a4 commit d4d54466185c297015feb908cb8f0fd7dd606ab1
@@ -14,8 +14,8 @@ work has been a joint effort between the AIRR Community's `Minimal
Standards`_ and `Data Representation`_ Working Groups.

.. _`FAIR`: https://www.force11.org/group/fairgroup/fairprinciples
.. _`Minimal Standards`: https://www.antibodysociety.org/airrc/working_groups/minimal_standards
.. _`Data Representation`: https://www.antibodysociety.org/airrc/working_groups/data_representation
.. _`Minimal Standards`: https://www.antibodysociety.org/the-airr-community/airr-working-groups/minimal_standards/
.. _`Data Representation`: https://www.antibodysociety.org/the-airr-community/airr-working-groups/data_representation


The MiAIRR Data Standard
@@ -65,7 +65,7 @@ Implementations
- `AIRR Common Repository`_ - *in development*

.. _`NCBI-based`: https://docs.airr-community.org/en/latest/miairr/miairr_ncbi_overview.html
.. _`AIRR Common Repository`: https://www.antibodysociety.org/airrc/working_groups/repository
.. _`AIRR Common Repository`: https://www.antibodysociety.org/the-airr-community/airr-working-groups/repository/


Contributions
@@ -18,8 +18,8 @@ RUN pip3 install \
biopython \
recommonmark \
sphinxcontrib-autoprogram \
sphinx_bootstrap_theme

sphinx_bootstrap_theme \
matplotlib

# Copy source
RUN mkdir /airr-standards
@@ -242,7 +242,6 @@ def __getattr__(cls, name): return MagicMock()
'Miscellaneous'),
]


# Load data for schemas
with open(os.path.abspath('../specs/airr-schema.yaml')) as ip:
airr_schema = yaml.load(ip, Loader=yamlordereddictloader.Loader)
@@ -263,10 +262,10 @@ def __getattr__(cls, name): return MagicMock()
# Write TSV
with open(os.path.join(dl_path, '%s.tsv' % spec), 'w') as f:
writer = csv.writer(f, dialect='excel-tab')
rows = ([k, v['type'], 'required' if k in required else 'optional', v['description'].strip()] \
for k, v in properties.items())
writer.writerow(fields)
writer.writerows(rows)
for k, v in properties.items():
row = [k, v['type'], 'required' if k in required else 'optional', v['description'].strip()]
writer.writerow(row)

# Build tables for repertoire metadata schema
tables = ['Repertoire', 'Study', 'Subject', 'Diagnosis', 'Sample', 'CellProcessing', 'NucleicAcidProcessing',
@@ -99,6 +99,32 @@ It is expected that typical ``Repertoires`` might only have a single
``data_processing_id`` will be semantically equivalent and only the
former should be used.

If a ``Repertoire`` has multiple sample processing objects in the sample
array then ``sample_processing_id`` should be used to distinguish the
the approrpiate sample processing object within the ``Repertoire``. The
``Rearrangement`` object can contain a ``sample_processing_id`` to uniquely
identify a sample processing object within a ``Repertoire``. Like
``data_processing_id``, the ``sample_processing_id`` is only unique within
the ``Repertoire`` so ``repertoire_id`` should first be used to get the
appropiate ``Repertoire`` object and then ``sample_processing_id`` should
be used to determine the appropiate sample processing object that is associated
with the ``Rearrangement``. If the ``Rearrangement`` object does not have a
``sample_processing_id`` then it can be assumed that the rearrangement is
associated with all of the samples in the ``Repertoire`` (e.g. the rearrangement
is a collapsed rearrangement across multiple samples).

It is expected that ``Repertoires`` might often have a single
sample processing object, in which case ``repertoire_id`` and
``sample_processing_id`` will be semantically equivalent and only the
former should be used.

Finally, if it is necessary to link a ``Rearrangement`` object with a unique
pairing of sample processing and ``DataProcessing``, the ``repertoire_id`` of
the ``Rearrangement`` object should be used to identify the correct ``Repertoire``
object and then the ``data_processing_id`` should be used to identify the correct
``DataProcessing`` metadata and the ``sample_processing_id`` should be used to
identify the correct sample processing metadata within that ``Repertoire``.

**Duality between Repertoires and Rearrangements**

There is an important duality relationship between ``Repertoires`` and
@@ -50,7 +50,7 @@ data and analysis results afterwards::
cd adc_example

# Invoke a shell session inside the docker image
docker run -it -v .:/data airrc/airr-standards:latest bash
docker run -it -v $PWD:/data airrc/airr-standards:latest bash

The first python script queries the data from the VDJServer data
repository and saves them into files::
@@ -61,96 +61,96 @@ repository and saves them into files::

Only a subset of the data is downloaded for illustration purposes, but
review the code to see how all data can be downloaded. A total of 40
repertoires and 293,414 rearrangements should be downloaded. The
repertoires and 300,178 rearrangements should be downloaded. The
repertoire metadata is saved in the ``repertoires.airr.json`` file,
and the rearrangements are saved in the ``rearrangements.tsv``
file. The script should take a few minutes to run and produce the
following display messages::

Info: AIRR Data Commons API
Info: VDJServer Community Data Portal
version: 1.3
description: API response for repertoire query
description: VDJServer ADC API response for repertoire query
Received 40 repertoires.
Retrieving rearrangements for repertoire: 2366080924918616551-242ac11c-0001-012
Retrieved 10000 rearrangements for repertoire: 2366080924918616551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 2541616238306136551-242ac11c-0001-012
Retrieved 6114 rearrangements for repertoire: 2541616238306136551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 1993707260355416551-242ac11c-0001-012
Retrieved 10000 rearrangements for repertoire: 1993707260355416551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 2197374609531736551-242ac11c-0001-012
Retrieved 10000 rearrangements for repertoire: 2197374609531736551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 3438706057421656551-242ac11c-0001-012
Retrieved 8856 rearrangements for repertoire: 3438706057421656551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 3628844259615576551-242ac11c-0001-012
Retrieved 4765 rearrangements for repertoire: 3628844259615576551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 2989624276951896551-242ac11c-0001-012
Retrieved 10000 rearrangements for repertoire: 2989624276951896551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 3252733973504856551-242ac11c-0001-012
Retrieved 10000 rearrangements for repertoire: 3252733973504856551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 4744762662462296551-242ac11c-0001-012
Retrieved 10000 rearrangements for repertoire: 4744762662462296551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 4931851437876056551-242ac11c-0001-012
Retrieved 9050 rearrangements for repertoire: 4931851437876056551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 4357957907784536551-242ac11c-0001-012
Retrieved 6782 rearrangements for repertoire: 4357957907784536551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 4476756703191896551-242ac11c-0001-012
Retrieved 10000 rearrangements for repertoire: 4476756703191896551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 6205695788196696551-242ac11c-0001-012
Retrieved 10000 rearrangements for repertoire: 6205695788196696551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 6393557657723736551-242ac11c-0001-012
Retrieved 6515 rearrangements for repertoire: 6393557657723736551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 7158276584776536551-242ac11c-0001-012
Retrieved 10000 rearrangements for repertoire: 7158276584776536551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 5953881855632216551-242ac11c-0001-012
Retrieved 5837 rearrangements for repertoire: 5953881855632216551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 7461458326201176551-242ac11c-0001-012
Retrieved 10000 rearrangements for repertoire: 7461458326201176551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 7640859110155096551-242ac11c-0001-012
Retrieved 10000 rearrangements for repertoire: 7640859110155096551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 6964444710708056551-242ac11c-0001-012
Retrieved 10000 rearrangements for repertoire: 6964444710708056551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 7313153105470296551-242ac11c-0001-012
Retrieved 8997 rearrangements for repertoire: 7313153105470296551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 8602072790999896551-242ac11c-0001-012
Retrieved 10000 rearrangements for repertoire: 8602072790999896551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 8733756488295256551-242ac11c-0001-012
Retrieved 4844 rearrangements for repertoire: 8733756488295256551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 8263242821018456551-242ac11c-0001-012
Retrieved 10000 rearrangements for repertoire: 8263242821018456551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 8425807333172056551-242ac11c-0001-012
Retrieved 6753 rearrangements for repertoire: 8425807333172056551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 7309695685264535065-242ac11c-0001-012
Retrieved 10000 rearrangements for repertoire: 7309695685264535065-242ac11c-0001-012
Retrieving rearrangements for repertoire: 8485700680582295065-242ac11c-0001-012
Retrieved 3598 rearrangements for repertoire: 8485700680582295065-242ac11c-0001-012
Retrieving rearrangements for repertoire: 9084118473933975065-242ac11c-0001-012
Retrieved 4354 rearrangements for repertoire: 9084118473933975065-242ac11c-0001-012
Retrieving rearrangements for repertoire: 8961797805343895065-242ac11c-0001-012
Retrieved 1166 rearrangements for repertoire: 8961797805343895065-242ac11c-0001-012
Retrieving rearrangements for repertoire: 5624006920930455065-242ac11c-0001-012
Retrieved 10000 rearrangements for repertoire: 5624006920930455065-242ac11c-0001-012
Retrieving rearrangements for repertoire: 7066128089908375065-242ac11c-0001-012
Retrieved 5049 rearrangements for repertoire: 7066128089908375065-242ac11c-0001-012
Retrieving rearrangements for repertoire: 5168912186246295065-242ac11c-0001-012
Retrieved 9768 rearrangements for repertoire: 5168912186246295065-242ac11c-0001-012
Retrieving rearrangements for repertoire: 5338391595746455065-242ac11c-0001-012
Retrieved 5521 rearrangements for repertoire: 5338391595746455065-242ac11c-0001-012
Retrieving rearrangements for repertoire: 4858300151399575065-242ac11c-0001-012
Retrieved 2885 rearrangements for repertoire: 4858300151399575065-242ac11c-0001-012
Retrieving rearrangements for repertoire: 5039977268020375065-242ac11c-0001-012
Retrieved 4053 rearrangements for repertoire: 5039977268020375065-242ac11c-0001-012
Retrieving rearrangements for repertoire: 6240077029868695065-242ac11c-0001-012
Retrieved 3506 rearrangements for repertoire: 6240077029868695065-242ac11c-0001-012
Retrieving rearrangements for repertoire: 6389112395039895065-242ac11c-0001-012
Retrieved 2289 rearrangements for repertoire: 6389112395039895065-242ac11c-0001-012
Retrieving rearrangements for repertoire: 5939858815878295065-242ac11c-0001-012
Retrieved 3637 rearrangements for repertoire: 5939858815878295065-242ac11c-0001-012
Retrieving rearrangements for repertoire: 6088937130722455065-242ac11c-0001-012
Retrieved 9028 rearrangements for repertoire: 6088937130722455065-242ac11c-0001-012
Retrieving rearrangements for repertoire: 7446748091679895065-242ac11c-0001-012
Retrieved 1540 rearrangements for repertoire: 7446748091679895065-242ac11c-0001-012
Retrieving rearrangements for repertoire: 7591789137265815065-242ac11c-0001-012
Retrieved 10000 rearrangements for repertoire: 7591789137265815065-242ac11c-0001-012
Retrieving rearrangements for repertoire: 7446748091679895065-242ac11c-0001-012
Retrieved 1527 rearrangements for repertoire: 7446748091679895065-242ac11c-0001-012
Retrieving rearrangements for repertoire: 6088937130722455065-242ac11c-0001-012
Retrieved 8634 rearrangements for repertoire: 6088937130722455065-242ac11c-0001-012
Retrieving rearrangements for repertoire: 5939858815878295065-242ac11c-0001-012
Retrieved 3393 rearrangements for repertoire: 5939858815878295065-242ac11c-0001-012
Retrieving rearrangements for repertoire: 6389112395039895065-242ac11c-0001-012
Retrieved 2274 rearrangements for repertoire: 6389112395039895065-242ac11c-0001-012
Retrieving rearrangements for repertoire: 6240077029868695065-242ac11c-0001-012
Retrieved 3480 rearrangements for repertoire: 6240077029868695065-242ac11c-0001-012
Retrieving rearrangements for repertoire: 5039977268020375065-242ac11c-0001-012
Retrieved 3773 rearrangements for repertoire: 5039977268020375065-242ac11c-0001-012
Retrieving rearrangements for repertoire: 4858300151399575065-242ac11c-0001-012
Retrieved 2612 rearrangements for repertoire: 4858300151399575065-242ac11c-0001-012
Retrieving rearrangements for repertoire: 5338391595746455065-242ac11c-0001-012
Retrieved 5399 rearrangements for repertoire: 5338391595746455065-242ac11c-0001-012
Retrieving rearrangements for repertoire: 5168912186246295065-242ac11c-0001-012
Retrieved 9642 rearrangements for repertoire: 5168912186246295065-242ac11c-0001-012
Retrieving rearrangements for repertoire: 7066128089908375065-242ac11c-0001-012
Retrieved 5662 rearrangements for repertoire: 7066128089908375065-242ac11c-0001-012
Retrieving rearrangements for repertoire: 5624006920930455065-242ac11c-0001-012
Retrieved 10000 rearrangements for repertoire: 5624006920930455065-242ac11c-0001-012
Retrieving rearrangements for repertoire: 8961797805343895065-242ac11c-0001-012
Retrieved 1179 rearrangements for repertoire: 8961797805343895065-242ac11c-0001-012
Retrieving rearrangements for repertoire: 9084118473933975065-242ac11c-0001-012
Retrieved 4464 rearrangements for repertoire: 9084118473933975065-242ac11c-0001-012
Retrieving rearrangements for repertoire: 8485700680582295065-242ac11c-0001-012
Retrieved 3908 rearrangements for repertoire: 8485700680582295065-242ac11c-0001-012
Retrieving rearrangements for repertoire: 7309695685264535065-242ac11c-0001-012
Retrieved 10000 rearrangements for repertoire: 7309695685264535065-242ac11c-0001-012
Retrieving rearrangements for repertoire: 8425807333172056551-242ac11c-0001-012
Retrieved 6863 rearrangements for repertoire: 8425807333172056551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 8263242821018456551-242ac11c-0001-012
Retrieved 10000 rearrangements for repertoire: 8263242821018456551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 8733756488295256551-242ac11c-0001-012
Retrieved 5298 rearrangements for repertoire: 8733756488295256551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 8602072790999896551-242ac11c-0001-012
Retrieved 10000 rearrangements for repertoire: 8602072790999896551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 7313153105470296551-242ac11c-0001-012
Retrieved 9121 rearrangements for repertoire: 7313153105470296551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 6964444710708056551-242ac11c-0001-012
Retrieved 10000 rearrangements for repertoire: 6964444710708056551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 7640859110155096551-242ac11c-0001-012
Retrieved 10000 rearrangements for repertoire: 7640859110155096551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 7461458326201176551-242ac11c-0001-012
Retrieved 10000 rearrangements for repertoire: 7461458326201176551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 5953881855632216551-242ac11c-0001-012
Retrieved 5916 rearrangements for repertoire: 5953881855632216551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 7158276584776536551-242ac11c-0001-012
Retrieved 10000 rearrangements for repertoire: 7158276584776536551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 6393557657723736551-242ac11c-0001-012
Retrieved 7257 rearrangements for repertoire: 6393557657723736551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 6205695788196696551-242ac11c-0001-012
Retrieved 10000 rearrangements for repertoire: 6205695788196696551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 4476756703191896551-242ac11c-0001-012
Retrieved 10000 rearrangements for repertoire: 4476756703191896551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 4357957907784536551-242ac11c-0001-012
Retrieved 7033 rearrangements for repertoire: 4357957907784536551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 4931851437876056551-242ac11c-0001-012
Retrieved 10000 rearrangements for repertoire: 4931851437876056551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 4744762662462296551-242ac11c-0001-012
Retrieved 10000 rearrangements for repertoire: 4744762662462296551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 3252733973504856551-242ac11c-0001-012
Retrieved 10000 rearrangements for repertoire: 3252733973504856551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 2989624276951896551-242ac11c-0001-012
Retrieved 10000 rearrangements for repertoire: 2989624276951896551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 3628844259615576551-242ac11c-0001-012
Retrieved 5208 rearrangements for repertoire: 3628844259615576551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 3438706057421656551-242ac11c-0001-012
Retrieved 9530 rearrangements for repertoire: 3438706057421656551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 2197374609531736551-242ac11c-0001-012
Retrieved 10000 rearrangements for repertoire: 2197374609531736551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 1993707260355416551-242ac11c-0001-012
Retrieved 10000 rearrangements for repertoire: 1993707260355416551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 2541616238306136551-242ac11c-0001-012
Retrieved 6512 rearrangements for repertoire: 2541616238306136551-242ac11c-0001-012
Retrieving rearrangements for repertoire: 2366080924918616551-242ac11c-0001-012
Retrieved 10000 rearrangements for repertoire: 2366080924918616551-242ac11c-0001-012

The second python script loads the data from the files and generates a
CDR3 amino acid length distribution that is grouped by the T cell
Binary file not shown.
@@ -48,7 +48,14 @@
"value":"456"
}
},
{
{
"op":"=",
"content": {
"field":"sample_processing_id",
"value":"789"
}
},
{
"op":"=",
"content": {
"field":"junction_aa",
@@ -15,4 +15,4 @@ v1.2 of the :ref:`AIRR Rearrangement schema <RearrangementSchema>`.
:file: tables/rearrangement_support.tsv
:delim: tab
:header-rows: 1
:widths: 30, 20, 50
:widths: 20, 10, 40, 30

0 comments on commit d4d5446

Please sign in to comment.
You can’t perform that action at this time.