Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of Profiled Cases count is off from legacy mutated-genes endpoint #20

Closed
haynescd opened this issue Jun 4, 2024 · 1 comment
Closed
Assignees
Labels
bug Something isn't working

Comments

@haynescd
Copy link
Collaborator

haynescd commented Jun 4, 2024

Currently the new (clickhouse) endpoint for fetching AlterationsCountByGenes for Mutations (/api/mutated-gens/fetch)
totalProfiledCases Count is below the legacy count by 4.

Difference found at TFRC.numberOfProfiledCases: (Legacy) 13638 != (New) 13634

After doing some initial research I have found that there are 4 samples that are not profiled at all. (I do not know if this makes sense... having samples in a study that are not profiled at all)

select count(distinct sample_id) from sample_profile INNER JOIN sample on sample_profile.sample_id = sample.internal_id INNER JOIN patient AS p ON sample.patient_id = p.internal_id INNER JOIN cancer_study AS cs ON p.cancer_study_id = cs.cancer_study_id where cancer_study_identifier = 'genie_public';
Returns 197976

select count(distinct sample_unique_id) from sample_view where cancer_study_identifier = 'genie_public';
Returns 197976

Query I used to determine which samples were not profiled.

select distinct sample_stable_id from sample_view where cancer_study_identifier = 'genie_public' and sample_stable_id not in ( SELECT DISTINCT s.stable_id FROM sample_profile sp INNER JOIN sample s ON sp.sample_id = s.internal_id INNER JOIN patient p ON s.patient_id = p.internal_id INNER JOIN cancer_study cs ON p.cancer_study_id = cs.cancer_study_id WHERE cs.cancer_study_identifier = 'genie_public' );

List of samples missing.

  • GENIE-PROV-4a776902-triseq-v2
  • GENIE-PROV-aaf13ded-triseq-v2
  • GENIE-PROV-ac6c9f4e-triseq-v2
  • GENIE-PROV-ec1b3e39-triseq-v2
@haynescd haynescd added the bug Something isn't working label Jun 4, 2024
@sheridancbio
Copy link
Contributor

This could be retested now that there has been a data update where all samples in genie_public have an assigned gene panel. With the latest development database this issue is likely fixed.

@alisman alisman closed this as completed Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants