Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset object populated with properties from h5ad uns #70

Conversation

ubyndr
Copy link
Collaborator

@ubyndr ubyndr commented Apr 18, 2024

Fixes #67

Example output;

ns:9698a8ea-19de-4273-acea-3728a4180fc1 a obo:PCL_0010001 ;
    rdfs:label "Splatter" ;
    ns:supercluster_term "Splatter" ;
    obo:RO_0015003 ns:d9f4490f-9801-4835-b52a-60e2423d259f ;
    dcterms:source ns:c1ca0cd3-144e-469c-84ae-cc0081e1f6a1 .

ns:d9f4490f-9801-4835-b52a-60e2423d259f a [ a owl:Restriction ;
            owl:onProperty obo:RO_0002473 ;
            owl:someValuesFrom obo:CL_0000540 ],
        obo:PCL_0010001 ;
    rdfs:label "neuron" ;
    ns:cell_type "neuron" ;
    dcterms:source ns:c1ca0cd3-144e-469c-84ae-cc0081e1f6a1 .

ns:c1ca0cd3-144e-469c-84ae-cc0081e1f6a1 a schema:Dataset ;
    rdfs:label "dataset" ;
    ns:citation "Publication: https://doi.org/10.1126/science.add7046 Dataset Version: https://datasets.cellxgene.cziscience.com/0bb62fec-0cf1-46e1-9d10-de65a6d4f814.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/283d65eb-dd53-496d-adb7-7570c7caa443" ;
    ns:obs_meta "[[{\"field_name\": \"ROIGroup\", \"field_type\": \"author_cell_type_label\"}], [{\"field_name\": \"ROIGroupCoarse\", \"field_type\": \"author_cell_type_label\"}], [{\"field_name\": \"ROIGroupFine\", \"field_type\": \"author_cell_type_label\"}], [{\"field_name\": \"cluster_id\", \"field_type\": \"author_cell_type_label\"}], [{\"field_name\": \"dissection\", \"field_type\": \"author_cell_type_label\"}], [{\"field_name\": \"roi\", \"field_type\": \"author_cell_type_label\"}], [{\"field_name\": \"sample_id\", \"field_type\": \"author_cell_type_label\"}], [{\"field_name\": \"subcluster_id\", \"field_type\": \"author_cell_type_label\"}], [{\"field_name\": \"supercluster_term\", \"field_type\": \"author_cell_type_label\"}]]" ;
    ns:schema_reference "https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/5.0.0/schema.md" ;
    ns:schema_version "5.0.0" ;
    ns:title "All neurons" .

I am currently using the following structure for the Dataset class:

DATASET = {"iri": "https://schema.org/Dataset", "label": "dataset"}

I am open to modifying this as per your preferences. Please let me know if you would like any changes.

@ubyndr ubyndr requested review from dosumis and hkir-dev April 18, 2024 10:09
@dosumis
Copy link
Contributor

dosumis commented Apr 18, 2024

Looks good. Thinking about it, the main use of the CxG citation field is probably to provide a blob of text that authors can paste into their publications. For Indexing purposes I think important to have contents split into distinct fields too.

dc:publication: DOI
Dataset_version: ... # Is there a schema.org property for this?
Collection: ...

Could be additional ticket as need to think through and co-ord how this will work with SCXA h5ad.

@ubyndr ubyndr merged commit e774404 into main Apr 18, 2024
@ubyndr ubyndr deleted the 67-rdf-representation-should-include-dataset-object-populated-with-properties-from-h5ad-uns branch April 18, 2024 11:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RDF representation should include dataset object populated with properties from h5ad uns
2 participants