new feature: download all data by type #111

sgosline · 2024-03-11T20:49:27Z

It'd be nice to get ALL data of a particular data type (transcriptomics, proteomics, etc.) regardless of source. Can you add a function to do this? We can also filter by source after the fact.

jjacobson95 · 2024-03-21T23:21:29Z

I could build this into its own function or class, however I think this could get redundant / confusing for users as this can actually already be done using the following commands:

import coderdata as cd

depmap = cd.DatasetLoader('depmap')
mpnst = cd.DatasetLoader('MPNST')
cptac = cd.DatasetLoader('cptac')
beataml = cd.DatasetLoader('beataml')
hcmi = cd.DatasetLoader('hcmi')

joined_data = cd.join_datasets(beataml,hcmi,cptac,depmap,mpnst)

joined_data.transcriptomics # all transcriptomics data
joined_data.proteomics # all proteomics data
joined_data.drugs # all drug data
joined_data.samples # all sample data
#  ... etc

sgosline · 2024-03-22T15:01:57Z

yes, but this assumes that people understand (and care) about the shorthand dataset names. For deep learning, they just need to know what type of data it is, and how much there is. How about you rename DatasetLoader to something like data_by_source and create a new function called data_by_type that includes 'transcriptomics', 'proteomics,' 'dose_response','perturbation','copy_number','mutations', etc. They can exist side by side.

you can add a sources and data_types function as well so that users can determine what to choose from. Then the above calls just become:

import coderdata as cd
sources = cd.sources
ds = {}
for so in srouces:
    ds[so] = cd.data_by_source(so)
joined_data = cd.join_data_by_source(ds.values())

Do you have ad ocument describing the general users and use cases of the package?

jjacobson95 · 2024-03-22T20:57:09Z

Okay will do. There is a general usage page in the docs but I haven't gotten a chance to update with the use cases - I'd like to directly link our tutorials to the docs but haven't had the time to do so yet. It takes quite a few extra steps with the CI blocked.

sgosline · 2024-03-22T21:05:42Z

Usage and use cases are not the same thing - use cases are the start of a design document that motivate the choices made in software development. Generally a good thing to have on hand to make detailed design decisions.

jjacobson95 · 2024-03-22T21:09:58Z

I didn't know about that - I'll add that as an issue.

sgosline · 2024-03-22T21:11:17Z

No need, it's not really a thing that can be fixed in the code base, just something that'll need to be done ahead of the paper/pub.

jjacobson95 · 2024-03-22T21:13:42Z

Shouldn't we keep track of if it as we will eventually need to add it to the docs?

sgosline · 2024-03-22T21:34:30Z

docs are for end users, they do not need to know how/why the software was designed as it was. Use cases/specifications are for developers so they can make informed implementation choices. I believe there are some github features to incorporate the full software engineering process, but i think that ship has sailed at this point :)

sgosline · 2024-11-12T01:11:16Z

Out of scope

sgosline added the enhancement New feature or request label Mar 11, 2024

sgosline added this to CoderData Apr 10, 2024

sgosline assigned jjacobson95 Apr 10, 2024

sgosline moved this to Ready in CoderData Apr 10, 2024

jjacobson95 moved this from Ready to In progress in CoderData Apr 29, 2024

jjacobson95 moved this from In progress to Ready in CoderData Apr 29, 2024

jjacobson95 mentioned this issue Apr 29, 2024

Convert Python Package from Pandas to Polars #159

Open

jjacobson95 added the package label Apr 29, 2024

sgosline closed this as not planned Won't fix, can't repro, duplicate, stale Nov 12, 2024

github-project-automation bot moved this from Ready to Done in CoderData Nov 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new feature: download all data by type #111

new feature: download all data by type #111

sgosline commented Mar 11, 2024

jjacobson95 commented Mar 21, 2024 •

edited

Loading

sgosline commented Mar 22, 2024 •

edited

Loading

jjacobson95 commented Mar 22, 2024

sgosline commented Mar 22, 2024

jjacobson95 commented Mar 22, 2024

sgosline commented Mar 22, 2024

jjacobson95 commented Mar 22, 2024

sgosline commented Mar 22, 2024

sgosline commented Nov 12, 2024

new feature: download all data by type #111

new feature: download all data by type #111

Comments

sgosline commented Mar 11, 2024

jjacobson95 commented Mar 21, 2024 • edited Loading

sgosline commented Mar 22, 2024 • edited Loading

jjacobson95 commented Mar 22, 2024

sgosline commented Mar 22, 2024

jjacobson95 commented Mar 22, 2024

sgosline commented Mar 22, 2024

jjacobson95 commented Mar 22, 2024

sgosline commented Mar 22, 2024

sgosline commented Nov 12, 2024

jjacobson95 commented Mar 21, 2024 •

edited

Loading

sgosline commented Mar 22, 2024 •

edited

Loading