Restructuring of Cell #776

javh · 2024-03-06T01:08:14Z

Initial changes to Cell, CellExpression and CellReactivity per March 4, 2024 call:

Remove Rearrangement and expression links in Cell.
Renamed CellExpression to Expression.
Renamed CellReactivity to Reactivity.

Also:

Added cell_subset, cell_phenotype, and cell_label for CL term, marker definition, and free text annotation, respectively.
Not sure I like these names, but I stuck with what was already in the CellProcessing object.
Not sure we need cell_phenotype, but I thought I'd include it for discussion purposes.

I only modified the main v2 spec. I will propagate once we agree on the changes.

Closes #768
Closes #477

scharch · 2024-03-06T03:46:37Z

Looks fine. Will need docs updates, at minimum for the name of Expression and Reactivity. But I think we should also add explicit guidance that says either "put RNAseq data in Expression" or explain how external data (eg GEO) can be linked at the Sample level.

bcorrie · 2024-03-06T18:44:44Z

Yes, looks good to me as well... Will hold off on review check until final is ready, but I think safe to sync the specs.

bussec · 2024-03-25T17:54:20Z

Looks good! Wondering whether we should also introduce a cell_label property into the CellProcessing object...

javh · 2024-03-25T18:33:07Z

From the call:

Add cell_label to CellProcessing.
Set Cell.cell_subset to miairr: important. Verify this is clear in the documentation.
Sync specs.
Merge.

bcorrie · 2024-03-29T16:56:25Z

Hate to say it, but I am trying to use the spec as it is now, and I see an issue...

Many of the Receptors in IEDB do not have the full variable domains for one chain, let along multiple chains. e.g. https://www.iedb.org/receptor/184739 has V, J, CDR3 for alpha and beta chains, but there is no full V domain.

The AIRR Spec has the V Domain for both chains as required and non-nullable. Is that really what we want. We would have no way of capturing IEDB:184739 in the AIRR Receptor schema if that was the case???

Some IEDB records do have the full domain (e.g. https://www.iedb.org/receptor/47) but this is not normal. Don't we want the Receptor object to be more flexible than that?

bcorrie · 2024-03-29T16:59:44Z

I am OK to make the above a separate issue so we can merge the current PR.

bcorrie · 2024-03-29T17:31:50Z

Asking the question another way. Lets say I analyze a single cell study and I find that some cells match (at some level of matching) a known Receptor in IEDB. Lets say the CDR3 and VJ genes are the same. So I create a Receptor object and link that Cell to that Receptor by adding the receptor_id to the Cell.receptors array.

But I am only allowed to do this if the IEDB entry is a paired chain with full V Domain for both chains. If you look at IEDB and search for paired chain receptors you are limited to 31,346 out of the 189,435 "Receptor like things" in IEDB. I am not 100% sure "paired chain == full v domain for both chains", so this might be limited even further.

This seems like a significant restriction. Should we be relaxing the constraints on the Receptor object?

scharch · 2024-03-29T17:33:32Z

Many of the Receptors in IEDB do not have the full variable domains for one chain, let along multiple chains. e.g. https://www.iedb.org/receptor/184739 has V, J, CDR3 for alpha and beta chains, but there is no full V domain.

It's a TCR, so the full V domain is implied. I think it's ok.

bcorrie · 2024-03-29T21:16:22Z

Many of the Receptors in IEDB do not have the full variable domains for one chain, let along multiple chains. e.g. https://www.iedb.org/receptor/184739 has V, J, CDR3 for alpha and beta chains, but there is no full V domain.

It's a TCR, so the full V domain is implied. I think it's ok.

Problem is that the receptor_variable_domain_1_aa field is required (as is the domain 2 field), non-nullable, and says:

        receptor_variable_domain_1_aa:
            type: string
            description: >
                Complete amino acid sequence of the mature variable domain of the Ig heavy, TCR beta or TCR delta chain.
                The mature variable domain is defined as encompassing all AA from and including first AA after the the
                signal peptide to and including the last AA that is completely encoded by the J gene.
            example: >
              QVQLQQPGAELVKPGASVKLSCKASGYTFTSYWMHWVKQRPGRGLEWIGRIDPNSGGTKYNEKFKSKATLTVDKPSSTAYMQLSSLTSEDSAVYYCARYDYYGSSYFDYWGQGTTLTVSS
            x-airr:
                nullable: false
                adc-query-support: true

So because we don't have the AA variable domains, you can't create an AIRR standards compliant Receptor object for https://www.iedb.org/receptor/184739

It seems to me that our requirements are to stringent.

bussec · 2024-03-29T22:46:44Z

The stringency in the schema is deliberate, as the purpose of Receptor was always to be complementary to the potentially incomplete receptor information in IEDB. People are free to download the IEDB database and all Receptor objects from the ADC and use whatever algorithm and arbitrary threshold they want to make guesses whether they are sufficiently similar. But I don't think that we need to degrade our schema for that.

bcorrie · 2024-03-31T16:29:31Z

I understand that use case, but if that is indeed the case, then we do not have a mechanism to link an AIRR entity to a known entity in another repository (e.g. IEDB) unless it meets those strict criteria.

There are 150,000+ such entities in IEDB that contain Receptor -> antigen/epitope reactivity information that we are unable to link to given the above constraints. My understanding of the AIRR Receptor object was that it would help us make such a link. You aren't telling me I have been wrong all this time are you 8-)

bcorrie · 2024-03-31T16:33:17Z

People are free to download the IEDB database and all Receptor objects from the ADC and use whatever algorithm and arbitrary threshold they want to make guesses whether they are sufficiently similar. But I don't think that we need to degrade our schema for that.

But shouldn't our schema have a mechanism to capture when such an algorithm is used to annotate an AIRR entity (Cell or Rearrangement) with such specificity/reactivity information.

bcorrie · 2024-03-31T16:39:14Z

Suggest we move this discussion to #781 and merge this PR.

javh · 2024-04-08T16:01:42Z

There's currently no Cell documentation to verify w.r.t. CL importance, so we'll come back to it.

Initial restructuring of Cell

14dfa11

javh added this to the AIRR 2.0 milestone Mar 6, 2024

javh requested review from bcorrie, scharch and bussec March 6, 2024 01:08

javh added this to To do in Single Cell via automation Mar 6, 2024

bussec approved these changes Mar 25, 2024

View reviewed changes

Single Cell automation moved this from To do to In progress Mar 25, 2024

bcorrie mentioned this pull request Mar 31, 2024

How do I capture known antigen/epitope reactivity to AIRR objects (Rearrangement/Cell) #781

Open

javh added 3 commits April 8, 2024 08:11

Sync specs and update cell_label

40968d1

punctuation!?...,;)

259755c

Sync Expression and Reactivity

020afc1

bcorrie approved these changes Apr 8, 2024

View reviewed changes

javh added 4 commits April 8, 2024 08:28

Add Cell.cell_label to test data

2e6300a

Missed some

05a6477

sync combined repertoire test data

b9c4795

fix field order in test data

4fd1bf9

javh merged commit 50c3ba2 into master Apr 8, 2024
5 checks passed

Single Cell automation moved this from In progress to Done Apr 8, 2024

javh deleted the issue-768 branch April 8, 2024 16:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restructuring of Cell #776

Restructuring of Cell #776

javh commented Mar 6, 2024

scharch commented Mar 6, 2024

bcorrie commented Mar 6, 2024

bussec commented Mar 25, 2024

javh commented Mar 25, 2024 •

edited

bcorrie commented Mar 29, 2024 •

edited

bcorrie commented Mar 29, 2024

bcorrie commented Mar 29, 2024 •

edited

scharch commented Mar 29, 2024

bcorrie commented Mar 29, 2024

bussec commented Mar 29, 2024

bcorrie commented Mar 31, 2024

bcorrie commented Mar 31, 2024

bcorrie commented Mar 31, 2024

javh commented Apr 8, 2024

Restructuring of Cell #776

Restructuring of Cell #776

Conversation

javh commented Mar 6, 2024

scharch commented Mar 6, 2024

bcorrie commented Mar 6, 2024

bussec commented Mar 25, 2024

javh commented Mar 25, 2024 • edited

bcorrie commented Mar 29, 2024 • edited

bcorrie commented Mar 29, 2024

bcorrie commented Mar 29, 2024 • edited

scharch commented Mar 29, 2024

bcorrie commented Mar 29, 2024

bussec commented Mar 29, 2024

bcorrie commented Mar 31, 2024

bcorrie commented Mar 31, 2024

bcorrie commented Mar 31, 2024

javh commented Apr 8, 2024

javh commented Mar 25, 2024 •

edited

bcorrie commented Mar 29, 2024 •

edited

bcorrie commented Mar 29, 2024 •

edited